# agent runs tests with coverage
$ go test -cover ./...
  ok  example.com/myproject  0.003s  coverage: 85.7%
 
$ go build -o server .

Can AI agents use Go?

The Go programming language toolchain. Agents use it to build binaries, run tests, manage modules, and format code.

Docs →GitHub →

See the latest run →

100% overall pass rate1 model tested4 tasksv1.24.133/6/2026

Go eval results by model

Model	Pass rate	Avg turns	Avg tokens
gpt-5-nano	100%	4.0	8.4k

Go task results by model

Task	gpt-5-nano
init-and-runeasy Initialize a Go module, create a main.go that prints 'Hello, World!', and run it with go run.	✓3t3 turns · 3.6k tokens
test-with-coveragemedium Create a Go module with a file math.go containing a function Add(a, b int) int that returns the sum. Create math_test.go with a test for Add. Run the tests with coverage.	✓5t5 turns · 18.5k tokens
format-and-veteasy Create a Go module with a deliberately poorly-formatted main.go file. Run go fmt to fix formatting, then run go vet to check for issues.	✓4t4 turns · 7.3k tokens
build-binarymedium Create a Go module with a main.go that accepts a --name flag (default 'World') and prints 'Hello, <name>!'. Build it into a binary called 'greeter', then run the binary with --name=Agent.	✓4t4 turns · 4.2k tokens

Task suite source51 lines · YAML

- id: init-and-run
  intent: Initialize a Go module, create a main.go that prints 'Hello, World!',
    and run it with go run.
  assert:
    - ran: go mod init
    - ran: go run
    - file_exists: go.mod
    - file_exists: main.go
    - output_contains: Hello, World!
  setup: []
  max_turns: 4
  difficulty: easy
  category: basics
- id: test-with-coverage
  intent: Create a Go module with a file math.go containing a function Add(a, b
    int) int that returns the sum. Create math_test.go with a test for Add. Run
    the tests with coverage.
  assert:
    - ran: go test.*-cover|go test.*-coverprofile
    - file_exists: math.go
    - file_exists: math_test.go
    - exit_code: 0
  setup: []
  max_turns: 6
  difficulty: medium
  category: testing
- id: format-and-vet
  intent: Create a Go module with a deliberately poorly-formatted main.go file.
    Run go fmt to fix formatting, then run go vet to check for issues.
  assert:
    - ran: go fmt|gofmt
    - ran: go vet
    - file_exists: main.go
  setup: []
  max_turns: 5
  difficulty: easy
  category: workflow
- id: build-binary
  intent: Create a Go module with a main.go that accepts a --name flag (default
    'World') and prints 'Hello, <name>!'. Build it into a binary called
    'greeter', then run the binary with --name=Agent.
  assert:
    - ran: go build
    - file_exists: greeter
    - ran: ./greeter.*--name
    - output_contains: Agent
  setup: []
  max_turns: 8
  difficulty: medium
  category: build

Evals are a snapshot, not a verdict. We run identical tasks across all models to keep comparisons fair. Results vary with CLI version, task selection, and model updates. Evals run weekly on 4 tasks using @cliwatch/cli-bench.

What you get with CLIWatch

Everything below is running live for Go — see the latest run. Set up the same for your CLI in minutes.

Model	Pass Rate	Delta
Sonnet 4.5	95%	+5%
GPT-4.1	80%	-5%
Haiku 4.5	65%	-10%

CI & PR Comments

Get automated PR comments with per-model pass rates, regressions, and a link to the full comparison dashboard.

Pass rateLast 30 days

v1.0v1.6

Track Over Time

See how your CLI's agent compatibility changes across releases. Spot trends and regressions at a glance.

thresholds:
  claude-sonnet-4-5: 80%
  gpt-4.1: 75%
  claude-haiku-4-5: 60%

Quality Gates

Set per-model pass rate thresholds. CI fails if evals drop below your targets.

Get this for your CLI

Run evals in CI, get PR comments with regressions, track pass rates over time, and gate merges on quality thresholds — all from a single GitHub Actions workflow.

Start Free Read the guide

Compare other CLI evals

git

npm

aws

fly