# agent creates a PR
$ gh pr create --title "Fix auth bug" --body "..."
  https://github.com/org/repo/pull/42
 
$ gh pr view 42 --json state,checks
  {"state": "OPEN", "checks": [...]}

Can AI agents use gh?

GitHub's official CLI. Agents use it to create PRs, manage issues, trigger workflows, and query repository data.

See the latest run →
75% overall pass rate1 model tested4 tasksv2.87.33/6/2026

gh eval results by model

ModelPass rateAvg turnsAvg tokens
gpt-5-nano75%3.75.5k

gh task results by model

Taskgpt-5-nano
list-reposeasy
List the public repositories for the 'cli' GitHub organization, showing just the name and description. Limit to 5 results.
3t
view-repo-detailseasy
Show details about the 'cli/cli' repository including its description, star count, and primary language.
1t
search-issuesmedium
Search for open issues in the 'vercel/next.js' repository that contain the word 'build' in the title. Show the top 3 results with their number and title.
4t
api-querymedium
Use the gh api command to get the latest release tag name for the 'docker/cli' repository.
4t
Task suite source42 lines · YAML
- id: list-repos
  intent: List the public repositories for the 'cli' GitHub organization, showing
    just the name and description. Limit to 5 results.
  assert:
    - ran: gh.*repo.*list|gh.*api
    - exit_code: 0
  setup: []
  max_turns: 3
  difficulty: easy
  category: query
- id: view-repo-details
  intent: Show details about the 'cli/cli' repository including its description,
    star count, and primary language.
  assert:
    - ran: gh.*repo.*view|gh.*api
    - output_contains: cli
  setup: []
  max_turns: 3
  difficulty: easy
  category: query
- id: search-issues
  intent: Search for open issues in the 'vercel/next.js' repository that contain
    the word 'build' in the title. Show the top 3 results with their number and
    title.
  assert:
    - ran: gh.*search.*issues|gh.*issue.*list|gh.*api
    - exit_code: 0
  setup: []
  max_turns: 4
  difficulty: medium
  category: search
- id: api-query
  intent: Use the gh api command to get the latest release tag name for the
    'docker/cli' repository.
  assert:
    - ran: gh api
    - exit_code: 0
  setup: []
  max_turns: 4
  difficulty: medium
  category: api

Evals are a snapshot, not a verdict. We run identical tasks across all models to keep comparisons fair. Results vary with CLI version, task selection, and model updates. Evals run weekly on 4 tasks using @cliwatch/cli-bench.

What you get with CLIWatch

Everything below is running live for gh see the latest run. Set up the same for your CLI in minutes.

ModelPass RateDelta
Sonnet 4.595%+5%
GPT-4.180%-5%
Haiku 4.565%-10%

CI & PR Comments

Get automated PR comments with per-model pass rates, regressions, and a link to the full comparison dashboard.

Pass rateLast 30 days
v1.0v1.6

Track Over Time

See how your CLI's agent compatibility changes across releases. Spot trends and regressions at a glance.

thresholds:
  claude-sonnet-4-5: 80%
  gpt-4.1: 75%
  claude-haiku-4-5: 60%

Quality Gates

Set per-model pass rate thresholds. CI fails if evals drop below your targets.

Get this for your CLI

Run evals in CI, get PR comments with regressions, track pass rates over time, and gate merges on quality thresholds — all from a single GitHub Actions workflow.

Compare other CLI evals