SKILLS Docs vs CLI --help: Where Do Agent Tokens Actually Go?
We measured token consumption for two approaches to giving agents CLI knowledge. The answer depends on whether the model has already seen your CLI.
Anthropic's Advanced Tool Use showed that loading all tool definitions upfront can burn 55–134K tokens before an agent does any work. Their solution — dynamic tool discovery — cut usage by 85%.
CLIs already have a version of this: --help. But there's a nuance that most discussions miss: for popular CLIs, the model already knows the commands from training data. The token comparison only gets real for CLIs the model hasn't seen.
Two approaches
Upfront loading — dump full docs into context (SKILLS.md, CLAUDE.md, MCP tool definitions). Complete knowledge, but you pay for everything whether the agent needs it or not.
Progressive discovery — a one-line hint in AGENTS.md tells the agent to use --help. It runs cli --help, picks a subcommand, drills deeper. Each step loads only what's needed.
## wrangler Discover commands: run `wrangler --help`, then `wrangler <cmd> --help`. $ wrangler --help # 836 tokens — see all commands → agent picks "deploy" $ wrangler deploy --help # 1,158 tokens — full flag reference → agent executes Total: ~2,000 tokens (vs ~11,000 for full docs)
Where it actually matters
For kubectl, gh, or docker, frontier models already know the commands. The agent rarely needs --help at all — it just runs the command from training knowledge. Both approaches are mostly theoretical for popular CLIs.
The comparison gets real for CLIs the model hasn't seen: Cloudflare's wrangler, your company's internal tools, any CLI released after the training cutoff. Here the agent truly starts from zero.
| CLI | Full docs | Progressive | Savings | In training data? |
|---|---|---|---|---|
| wrangler | ~11K | ~2K (deploy) | 82% | Partially |
| kubectl | ~30K | ~1.5K (restart) | 95% | Yes |
| gh | ~116K | ~2.1K (create PR) | 98% | Yes |
| your-cli | ? | ? | — | No |
The last row is the one that matters. For your CLI — internal tools, new projects, anything post-training-cutoff — the agent has zero knowledge and must discover everything. (Methodology: crawled subcommand help two levels deep, ~4 chars/token. These numbers measure help text size, not actual agent consumption. Real usage depends on whether the agent navigates correctly or backtracks.)
The catch: help text quality
Progressive discovery only saves tokens when the agent navigates correctly on the first try. Vague descriptions cause backtracking.
$ mycli --help Commands: manage Manage things config Configuration run Run stuff
$ mycli --help Commands: manage Create, update, and delete deployments config Get/set configuration values and profiles run Execute a deployment pipeline for a service
Vague top-level help means the agent tries manage --help, backtracks to run --help — 3x the tokens, 2 wasted turns. For niche CLIs where the model can't fall back on training knowledge, help text quality is the entire game.
But --help wasn't designed for agents
Progressive --help works, but it's freeform text designed for humans. There's no way to express workflows, common patterns, or “if you're trying to do X, start here.” And the AGENTS.md bootstrap hint is manual and unstandardized.
What if CLIs shipped a discovery mechanism designed for agents? A structured, progressive cli skills command that returns exactly what an agent needs at each level — with examples, workflows, and machine-readable output.
That's what we explore in Part 2: Designing a CLI Skills Protocol.
How efficient is your CLI for agents?
Evaluate your CLI with real LLM agents. Measure pass rate, turns, and token consumption per task.