# agent runs tests with coverage
$ go test -cover ./...
  ok  example.com/myproject  0.003s  coverage: 85.7%
 
$ go build -o server .

Can AI agents use Go?

The Go programming language toolchain. Agents use it to build binaries, run tests, manage modules, and format code.

See the latest run →
100% overall pass rate1 model tested4 tasksv1.24.133/6/2026

Go eval results by model

ModelPass rateAvg turnsAvg tokens
gpt-5-nano100%4.08.4k

Go task results by model

Taskgpt-5-nano
init-and-runeasy
Initialize a Go module, create a main.go that prints 'Hello, World!', and run it with go run.
3t
test-with-coveragemedium
Create a Go module with a file math.go containing a function Add(a, b int) int that returns the sum. Create math_test.go with a test for Add. Run the tests with coverage.
5t
format-and-veteasy
Create a Go module with a deliberately poorly-formatted main.go file. Run go fmt to fix formatting, then run go vet to check for issues.
4t
build-binarymedium
Create a Go module with a main.go that accepts a --name flag (default 'World') and prints 'Hello, <name>!'. Build it into a binary called 'greeter', then run the binary with --name=Agent.
4t
Task suite source51 lines · YAML
- id: init-and-run
  intent: Initialize a Go module, create a main.go that prints 'Hello, World!',
    and run it with go run.
  assert:
    - ran: go mod init
    - ran: go run
    - file_exists: go.mod
    - file_exists: main.go
    - output_contains: Hello, World!
  setup: []
  max_turns: 4
  difficulty: easy
  category: basics
- id: test-with-coverage
  intent: Create a Go module with a file math.go containing a function Add(a, b
    int) int that returns the sum. Create math_test.go with a test for Add. Run
    the tests with coverage.
  assert:
    - ran: go test.*-cover|go test.*-coverprofile
    - file_exists: math.go
    - file_exists: math_test.go
    - exit_code: 0
  setup: []
  max_turns: 6
  difficulty: medium
  category: testing
- id: format-and-vet
  intent: Create a Go module with a deliberately poorly-formatted main.go file.
    Run go fmt to fix formatting, then run go vet to check for issues.
  assert:
    - ran: go fmt|gofmt
    - ran: go vet
    - file_exists: main.go
  setup: []
  max_turns: 5
  difficulty: easy
  category: workflow
- id: build-binary
  intent: Create a Go module with a main.go that accepts a --name flag (default
    'World') and prints 'Hello, <name>!'. Build it into a binary called
    'greeter', then run the binary with --name=Agent.
  assert:
    - ran: go build
    - file_exists: greeter
    - ran: ./greeter.*--name
    - output_contains: Agent
  setup: []
  max_turns: 8
  difficulty: medium
  category: build

Evals are a snapshot, not a verdict. We run identical tasks across all models to keep comparisons fair. Results vary with CLI version, task selection, and model updates. Evals run weekly on 4 tasks using @cliwatch/cli-bench.

What you get with CLIWatch

Everything below is running live for Go see the latest run. Set up the same for your CLI in minutes.

ModelPass RateDelta
Sonnet 4.595%+5%
GPT-4.180%-5%
Haiku 4.565%-10%

CI & PR Comments

Get automated PR comments with per-model pass rates, regressions, and a link to the full comparison dashboard.

Pass rateLast 30 days
v1.0v1.6

Track Over Time

See how your CLI's agent compatibility changes across releases. Spot trends and regressions at a glance.

thresholds:
  claude-sonnet-4-5: 80%
  gpt-4.1: 75%
  claude-haiku-4-5: 60%

Quality Gates

Set per-model pass rate thresholds. CI fails if evals drop below your targets.

Get this for your CLI

Run evals in CI, get PR comments with regressions, track pass rates over time, and gate merges on quality thresholds — all from a single GitHub Actions workflow.

Compare other CLI evals