CLI Agent-Readiness Directory

How well do popular CLIs work with AI coding agents? Each CLI is independently evaluated on its own set of real-world tasks. Scores reflect per-CLI agent-readiness, not cross-CLI rankings; different CLIs have different tasks and complexity levels.

Last updated Grade reflects average pass rate.
CLICategoryPass RateTasks
AAgent Ready6 CLIs
Cloud100%4
Developer Tools100%4
Developer Tools100%4
Developer Tools100%4
Package Managers100%4
Cloud / DevOps94%18
BAlmost There6 CLIs
Developer Tools89%18
Cloud / DevOps78%18
Developer Tools75%4
Data Processing88%16
Package Managers89%19
Developer Tools86%14
CRoom to Grow7 CLIs
Package Managers63%19
Cloud58%19
Cloud / DevOps50%4
Package Managers63%19
Database50%4
Database71%14
Developer Tools50%14

This is an early-stage showcase. Each CLI has its own task suite, so cross-CLI comparison is approximate. Our goal is standardized, comparable evals. We're adding more tasks and models regularly.

Explore evals

Detailed eval results for each CLI — per-model pass rates, per-task breakdowns, and the full task suite source.

git
# agent checks repo status and history
$ git status --porcelain
  M  src/index.ts
  ?? new-file.ts
 
GitDeveloper Tools

The ubiquitous version control system. Agents use it to commit changes, manage branches, resolve conflicts, and navigate repository history.

View analysis →
jq
# agent extracts data from API response
$ echo '[{"name":"a","score":95},{"name":"b","score":72}]' |
    jq '.[] | select(.score > 80) | .name'
  "a"
jqData Processing

A lightweight command-line JSON processor. Agents use it to parse, filter, transform, and format JSON data from APIs and files.

View analysis →
npm
# agent manages dependencies
$ npm ls --depth=0 --json
  {"dependencies":{"express":{"version":"4.21.0"},
   "typescript":{"version":"5.7.0"}}}
 
npmPackage Managers

The Node.js package manager. Agents use it to install dependencies, run scripts, manage versions, and publish packages.

View analysis →
kubectl
# agent checks pod health
$ kubectl get pods -n production -o json
  {"items": [{"metadata": {"name": "api-7d4.."}
   "status": {"phase": "Running"}}]}
 
kubectlCloud / DevOps

The Kubernetes command-line tool. Used by agents to manage clusters, inspect workloads, debug pods, and apply manifests.

View analysis →
gh
# agent creates a PR
$ gh pr create --title "Fix auth bug" --body "..."
  https://github.com/org/repo/pull/42
 
$ gh pr view 42 --json state,checks
ghDeveloper Tools

GitHub's official CLI. Agents use it to create PRs, manage issues, trigger workflows, and query repository data.

View analysis →
docker
# agent inspects a running container
$ docker ps --format json
  {"ID":"a1b2c3","Names":"api","Status":"Up 2h"}
 
$ docker logs api --tail 10
DockerCloud / DevOps

Container management CLI. Agents build images, run containers, manage volumes, and inspect running services.

View analysis →
terraform
# agent plans and inspects state
$ terraform plan -json
  {"type":"planned_change","change":{
   "resource":"local_file.config",
   "action":"create"}}
TerraformCloud / DevOps

Infrastructure as code tool. Agents plan and apply infrastructure changes, inspect state, and manage workspaces.

View analysis →
aws
# agent lists EC2 instances
$ aws ec2 describe-instances --query
    'Reservations[].Instances[].{Id:InstanceId,
     State:State.Name}' --output json
  [{"Id":"i-0a1b2c","State":"running"}]
AWSCloud

Amazon Web Services CLI. Agents manage cloud resources, configure services, and query infrastructure across AWS.

View analysis →
vercel
# agent deploys to production
$ vercel --prod --yes
  Deploying to production...
  https://app.example.com
 
VercelDeveloper Tools

Frontend deployment platform CLI. Agents deploy projects, manage environment variables, and configure domains.

View analysis →
stripe
# agent retrieves a customer
$ stripe customers list --limit 1
  {"data": [{"id": "cus_abc123",
   "email": "user@example.com"}]}
StripeDeveloper Tools

Payment infrastructure CLI. Agents listen to webhooks, trigger test events, and manage Stripe resources.

View analysis →
fly
# agent checks app status
$ fly status --json
  {"Name":"api","Status":"deployed",
   "Machines":[{"region":"iad","state":"started"}]}
Fly.ioCloud

Edge computing platform CLI. Agents deploy apps, manage machines, scale regions, and monitor deployments.

View analysis →
supabase
# agent runs a migration
$ supabase db push
  Connecting to remote database...
  Applying migration 20260208_add_users.sql...
  Finished supabase db push.
SupabaseDatabase

Open-source Firebase alternative CLI. Agents manage databases, run migrations, generate types, and manage edge functions.

View analysis →
cargo
# agent builds and tests a Rust project
$ cargo test
   Compiling mylib v0.1.0
     Running unittests src/lib.rs
  test result: ok. 3 passed; 0 failed
CargoPackage Managers

The Rust package manager and build system. Agents use it to create projects, manage dependencies, run tests, and build optimized binaries.

View analysis →
curl
# agent tests an API endpoint
$ curl -s https://httpbin.org/get | jq .origin
  "203.0.113.42"
 
$ curl -w '%{http_code}' -o /dev/null -s https://example.com
curlDeveloper Tools

The universal command-line HTTP client. Agents use it to make API calls, download files, test endpoints, and debug HTTP traffic.

View analysis →
go
# agent runs tests with coverage
$ go test -cover ./...
  ok  example.com/myproject  0.003s  coverage: 85.7%
 
$ go build -o server .
GoDeveloper Tools

The Go programming language toolchain. Agents use it to build binaries, run tests, manage modules, and format code.

View analysis →
pnpm
# agent manages a monorepo
$ pnpm add lodash
  + lodash 4.17.21
 
$ pnpm ls --depth=0
pnpmPackage Managers

Fast, disk-efficient JavaScript package manager. Agents use it to install dependencies, run scripts, and manage monorepo workspaces.

View analysis →
pip3
# agent manages Python dependencies
$ pip3 install requests
  Successfully installed requests-2.31.0
 
$ pip3 freeze > requirements.txt
pipPackage Managers

The Python package installer. Agents use it to install libraries, manage virtual environments, freeze requirements, and audit dependencies.

View analysis →
psql
# agent queries a database
$ psql -c "SELECT name, score FROM users ORDER BY score DESC LIMIT 3"
   name  | score
  -------+-------
   alice |    95
psqlDatabase

The PostgreSQL interactive terminal. Agents use it to run queries, manage schemas, import/export data, and administer databases.

View analysis →
make
# agent runs build targets
$ make build
  gcc -Wall -O2 -o app main.c
 
$ make test
GNU MakeDeveloper Tools

The classic build automation tool. Agents use it to run build targets, manage dependencies between tasks, and automate project workflows.

View analysis →

Want to know how well AI agents can use your CLI?

Track pass rates, compare models, and catch regressions across releases.