# agent creates a PR
$ gh pr create --title "Fix auth bug" --body "..."
  https://github.com/org/repo/pull/42
 
$ gh pr view 42 --json state,checks
  {"state": "OPEN", "checks": [...]}

Can AI agents use gh?

GitHub's official CLI. Agents use it to create PRs, manage issues, trigger workflows, and query repository data.

Docs →GitHub →

See the latest run →

75% overall pass rate1 model tested4 tasksv2.87.33/6/2026

gh eval results by model

Model	Pass rate	Avg turns	Avg tokens
gpt-5-nano	75%	3.7	5.5k

gh task results by model

Task	gpt-5-nano
list-reposeasy List the public repositories for the 'cli' GitHub organization, showing just the name and description. Limit to 5 results.	✓3t3 turns · 5.5k tokens
view-repo-detailseasy Show details about the 'cli/cli' repository including its description, star count, and primary language.	✗1t1 turn · 1.6k tokens
search-issuesmedium Search for open issues in the 'vercel/next.js' repository that contain the word 'build' in the title. Show the top 3 results with their number and title.	✓4t4 turns · 10.6k tokens
api-querymedium Use the gh api command to get the latest release tag name for the 'docker/cli' repository.	✓4t4 turns · 4.1k tokens

Task suite source42 lines · YAML

- id: list-repos
  intent: List the public repositories for the 'cli' GitHub organization, showing
    just the name and description. Limit to 5 results.
  assert:
    - ran: gh.*repo.*list|gh.*api
    - exit_code: 0
  setup: []
  max_turns: 3
  difficulty: easy
  category: query
- id: view-repo-details
  intent: Show details about the 'cli/cli' repository including its description,
    star count, and primary language.
  assert:
    - ran: gh.*repo.*view|gh.*api
    - output_contains: cli
  setup: []
  max_turns: 3
  difficulty: easy
  category: query
- id: search-issues
  intent: Search for open issues in the 'vercel/next.js' repository that contain
    the word 'build' in the title. Show the top 3 results with their number and
    title.
  assert:
    - ran: gh.*search.*issues|gh.*issue.*list|gh.*api
    - exit_code: 0
  setup: []
  max_turns: 4
  difficulty: medium
  category: search
- id: api-query
  intent: Use the gh api command to get the latest release tag name for the
    'docker/cli' repository.
  assert:
    - ran: gh api
    - exit_code: 0
  setup: []
  max_turns: 4
  difficulty: medium
  category: api

Evals are a snapshot, not a verdict. We run identical tasks across all models to keep comparisons fair. Results vary with CLI version, task selection, and model updates. Evals run weekly on 4 tasks using @cliwatch/cli-bench.

What you get with CLIWatch

Everything below is running live for gh — see the latest run. Set up the same for your CLI in minutes.

Model	Pass Rate	Delta
Sonnet 4.5	95%	+5%
GPT-4.1	80%	-5%
Haiku 4.5	65%	-10%

CI & PR Comments

Get automated PR comments with per-model pass rates, regressions, and a link to the full comparison dashboard.

Pass rateLast 30 days

v1.0v1.6

Track Over Time

See how your CLI's agent compatibility changes across releases. Spot trends and regressions at a glance.

thresholds:
  claude-sonnet-4-5: 80%
  gpt-4.1: 75%
  claude-haiku-4-5: 60%

Quality Gates

Set per-model pass rate thresholds. CI fails if evals drop below your targets.

Get this for your CLI

Run evals in CI, get PR comments with regressions, track pass rates over time, and gate merges on quality thresholds — all from a single GitHub Actions workflow.

Start Free Read the guide

Compare other CLI evals

git

npm

aws

fly