# agent runs tests with coverage $ go test -cover ./... ok example.com/myproject 0.003s coverage: 85.7% $ go build -o server .
Can AI agents use Go?
The Go programming language toolchain. Agents use it to build binaries, run tests, manage modules, and format code.
See the latest run →Go eval results by model
| Model | Pass rate | Avg turns | Avg tokens |
|---|---|---|---|
| gpt-5-nano | 100% | 4.0 | 8.4k |
Go task results by model
| Task | gpt-5-nano |
|---|---|
init-and-runeasy Initialize a Go module, create a main.go that prints 'Hello, World!', and run it with go run. | ✓3t |
test-with-coveragemedium Create a Go module with a file math.go containing a function Add(a, b int) int that returns the sum. Create math_test.go with a test for Add. Run the tests with coverage. | ✓5t |
format-and-veteasy Create a Go module with a deliberately poorly-formatted main.go file. Run go fmt to fix formatting, then run go vet to check for issues. | ✓4t |
build-binarymedium Create a Go module with a main.go that accepts a --name flag (default 'World') and prints 'Hello, <name>!'. Build it into a binary called 'greeter', then run the binary with --name=Agent. | ✓4t |
Task suite source51 lines · YAML
- id: init-and-run
intent: Initialize a Go module, create a main.go that prints 'Hello, World!',
and run it with go run.
assert:
- ran: go mod init
- ran: go run
- file_exists: go.mod
- file_exists: main.go
- output_contains: Hello, World!
setup: []
max_turns: 4
difficulty: easy
category: basics
- id: test-with-coverage
intent: Create a Go module with a file math.go containing a function Add(a, b
int) int that returns the sum. Create math_test.go with a test for Add. Run
the tests with coverage.
assert:
- ran: go test.*-cover|go test.*-coverprofile
- file_exists: math.go
- file_exists: math_test.go
- exit_code: 0
setup: []
max_turns: 6
difficulty: medium
category: testing
- id: format-and-vet
intent: Create a Go module with a deliberately poorly-formatted main.go file.
Run go fmt to fix formatting, then run go vet to check for issues.
assert:
- ran: go fmt|gofmt
- ran: go vet
- file_exists: main.go
setup: []
max_turns: 5
difficulty: easy
category: workflow
- id: build-binary
intent: Create a Go module with a main.go that accepts a --name flag (default
'World') and prints 'Hello, <name>!'. Build it into a binary called
'greeter', then run the binary with --name=Agent.
assert:
- ran: go build
- file_exists: greeter
- ran: ./greeter.*--name
- output_contains: Agent
setup: []
max_turns: 8
difficulty: medium
category: build
Evals are a snapshot, not a verdict. We run identical tasks across all models to keep comparisons fair. Results vary with CLI version, task selection, and model updates. Evals run weekly on 4 tasks using @cliwatch/cli-bench.
What you get with CLIWatch
Everything below is running live for Go — see the latest run. Set up the same for your CLI in minutes.
| Model | Pass Rate | Delta |
|---|---|---|
| Sonnet 4.5 | 95% | +5% |
| GPT-4.1 | 80% | -5% |
| Haiku 4.5 | 65% | -10% |
CI & PR Comments
Get automated PR comments with per-model pass rates, regressions, and a link to the full comparison dashboard.
Track Over Time
See how your CLI's agent compatibility changes across releases. Spot trends and regressions at a glance.
thresholds:
claude-sonnet-4-5: 80%
gpt-4.1: 75%
claude-haiku-4-5: 60%Quality Gates
Set per-model pass rate thresholds. CI fails if evals drop below your targets.
Get this for your CLI
Run evals in CI, get PR comments with regressions, track pass rates over time, and gate merges on quality thresholds — all from a single GitHub Actions workflow.