CLI Agent-Readiness Directory
How well do popular CLIs work with AI coding agents? Each CLI is independently evaluated on its own set of real-world tasks. Scores reflect per-CLI agent-readiness, not cross-CLI rankings; different CLIs have different tasks and complexity levels.
| CLI | Category | Pass Rate | Tasks | ||
|---|---|---|---|---|---|
AAgent Ready6 CLIs | |||||
| Cloud | 100% | 4 | |||
| Developer Tools | 100% | 4 | |||
| Developer Tools | 100% | 4 | |||
| Developer Tools | 100% | 4 | |||
| Package Managers | 100% | 4 | |||
| Cloud / DevOps | 94% | 18 | |||
BAlmost There6 CLIs | |||||
| Developer Tools | 89% | 18 | |||
| Cloud / DevOps | 78% | 18 | |||
| Developer Tools | 75% | 4 | |||
| Data Processing | 88% | 16 | |||
| Package Managers | 89% | 19 | |||
| Developer Tools | 86% | 14 | |||
CRoom to Grow7 CLIs | |||||
| Package Managers | 63% | 19 | |||
| Cloud | 58% | 19 | |||
| Cloud / DevOps | 50% | 4 | |||
| Package Managers | 63% | 19 | |||
| Database | 50% | 4 | |||
| Database | 71% | 14 | |||
| Developer Tools | 50% | 14 | |||
This is an early-stage showcase. Each CLI has its own task suite, so cross-CLI comparison is approximate. Our goal is standardized, comparable evals. We're adding more tasks and models regularly.
Explore evals
Detailed eval results for each CLI — per-model pass rates, per-task breakdowns, and the full task suite source.
# agent checks repo status and history $ git status --porcelain M src/index.ts ?? new-file.ts
The ubiquitous version control system. Agents use it to commit changes, manage branches, resolve conflicts, and navigate repository history.
View analysis →# agent extracts data from API response $ echo '[{"name":"a","score":95},{"name":"b","score":72}]' | jq '.[] | select(.score > 80) | .name' "a"
A lightweight command-line JSON processor. Agents use it to parse, filter, transform, and format JSON data from APIs and files.
View analysis →# agent manages dependencies $ npm ls --depth=0 --json {"dependencies":{"express":{"version":"4.21.0"}, "typescript":{"version":"5.7.0"}}}
The Node.js package manager. Agents use it to install dependencies, run scripts, manage versions, and publish packages.
View analysis →# agent checks pod health $ kubectl get pods -n production -o json {"items": [{"metadata": {"name": "api-7d4.."} "status": {"phase": "Running"}}]}
The Kubernetes command-line tool. Used by agents to manage clusters, inspect workloads, debug pods, and apply manifests.
View analysis →# agent creates a PR $ gh pr create --title "Fix auth bug" --body "..." https://github.com/org/repo/pull/42 $ gh pr view 42 --json state,checks
GitHub's official CLI. Agents use it to create PRs, manage issues, trigger workflows, and query repository data.
View analysis →# agent inspects a running container $ docker ps --format json {"ID":"a1b2c3","Names":"api","Status":"Up 2h"} $ docker logs api --tail 10
Container management CLI. Agents build images, run containers, manage volumes, and inspect running services.
View analysis →# agent plans and inspects state $ terraform plan -json {"type":"planned_change","change":{ "resource":"local_file.config", "action":"create"}}
Infrastructure as code tool. Agents plan and apply infrastructure changes, inspect state, and manage workspaces.
View analysis →# agent lists EC2 instances $ aws ec2 describe-instances --query 'Reservations[].Instances[].{Id:InstanceId, State:State.Name}' --output json [{"Id":"i-0a1b2c","State":"running"}]
Amazon Web Services CLI. Agents manage cloud resources, configure services, and query infrastructure across AWS.
View analysis →# agent deploys to production $ vercel --prod --yes Deploying to production... https://app.example.com
Frontend deployment platform CLI. Agents deploy projects, manage environment variables, and configure domains.
View analysis →# agent retrieves a customer $ stripe customers list --limit 1 {"data": [{"id": "cus_abc123", "email": "user@example.com"}]}
Payment infrastructure CLI. Agents listen to webhooks, trigger test events, and manage Stripe resources.
View analysis →# agent checks app status $ fly status --json {"Name":"api","Status":"deployed", "Machines":[{"region":"iad","state":"started"}]}
Edge computing platform CLI. Agents deploy apps, manage machines, scale regions, and monitor deployments.
View analysis →# agent runs a migration $ supabase db push Connecting to remote database... Applying migration 20260208_add_users.sql... Finished supabase db push.
Open-source Firebase alternative CLI. Agents manage databases, run migrations, generate types, and manage edge functions.
View analysis →# agent builds and tests a Rust project $ cargo test Compiling mylib v0.1.0 Running unittests src/lib.rs test result: ok. 3 passed; 0 failed
The Rust package manager and build system. Agents use it to create projects, manage dependencies, run tests, and build optimized binaries.
View analysis →# agent tests an API endpoint $ curl -s https://httpbin.org/get | jq .origin "203.0.113.42" $ curl -w '%{http_code}' -o /dev/null -s https://example.com
The universal command-line HTTP client. Agents use it to make API calls, download files, test endpoints, and debug HTTP traffic.
View analysis →# agent runs tests with coverage $ go test -cover ./... ok example.com/myproject 0.003s coverage: 85.7% $ go build -o server .
The Go programming language toolchain. Agents use it to build binaries, run tests, manage modules, and format code.
View analysis →# agent manages a monorepo $ pnpm add lodash + lodash 4.17.21 $ pnpm ls --depth=0
Fast, disk-efficient JavaScript package manager. Agents use it to install dependencies, run scripts, and manage monorepo workspaces.
View analysis →# agent manages Python dependencies $ pip3 install requests Successfully installed requests-2.31.0 $ pip3 freeze > requirements.txt
The Python package installer. Agents use it to install libraries, manage virtual environments, freeze requirements, and audit dependencies.
View analysis →# agent queries a database $ psql -c "SELECT name, score FROM users ORDER BY score DESC LIMIT 3" name | score -------+------- alice | 95
The PostgreSQL interactive terminal. Agents use it to run queries, manage schemas, import/export data, and administer databases.
View analysis →# agent runs build targets $ make build gcc -Wall -O2 -o app main.c $ make test
The classic build automation tool. Agents use it to run build targets, manage dependencies between tasks, and automate project workflows.
View analysis →Want to know how well AI agents can use your CLI?
Track pass rates, compare models, and catch regressions across releases.
