·4 min read

SKILLS Docs vs CLI --help: Where Do Agent Tokens Actually Go?

We measured token consumption for two approaches to giving agents CLI knowledge. The answer depends on whether the model has already seen your CLI.

Anthropic's Advanced Tool Use showed that loading all tool definitions upfront can burn 55–134K tokens before an agent does any work. Their solution — dynamic tool discovery — cut usage by 85%.

CLIs already have a version of this: --help. But there's a nuance that most discussions miss: for popular CLIs, the model already knows the commands from training data. The token comparison only gets real for CLIs the model hasn't seen.

Two approaches

Upfront loading — dump full docs into context (SKILLS.md, CLAUDE.md, MCP tool definitions). Complete knowledge, but you pay for everything whether the agent needs it or not.

Progressive discovery — a one-line hint in AGENTS.md tells the agent to use --help. It runs cli --help, picks a subcommand, drills deeper. Each step loads only what's needed.

AGENTS.md bootstrap (~30 tokens) → progressive discovery
## wrangler
Discover commands: run `wrangler --help`, then `wrangler <cmd> --help`.

$ wrangler --help          # 836 tokens — see all commands
→ agent picks "deploy"
$ wrangler deploy --help   # 1,158 tokens — full flag reference
→ agent executes

Total: ~2,000 tokens (vs ~11,000 for full docs)

Where it actually matters

For kubectl, gh, or docker, frontier models already know the commands. The agent rarely needs --help at all — it just runs the command from training knowledge. Both approaches are mostly theoretical for popular CLIs.

The comparison gets real for CLIs the model hasn't seen: Cloudflare's wrangler, your company's internal tools, any CLI released after the training cutoff. Here the agent truly starts from zero.

CLIFull docsProgressiveSavingsIn training data?
wrangler~11K~2K (deploy)82%Partially
kubectl~30K~1.5K (restart)95%Yes
gh~116K~2.1K (create PR)98%Yes
your-cli??No

The last row is the one that matters. For your CLI — internal tools, new projects, anything post-training-cutoff — the agent has zero knowledge and must discover everything. (Methodology: crawled subcommand help two levels deep, ~4 chars/token. These numbers measure help text size, not actual agent consumption. Real usage depends on whether the agent navigates correctly or backtracks.)

The catch: help text quality

Progressive discovery only saves tokens when the agent navigates correctly on the first try. Vague descriptions cause backtracking.

Agent can't navigate this
$ mycli --help
Commands:
  manage   Manage things
  config   Configuration
  run      Run stuff
Agent navigates in one step
$ mycli --help
Commands:
  manage   Create, update, and delete deployments
  config   Get/set configuration values and profiles
  run      Execute a deployment pipeline for a service

Vague top-level help means the agent tries manage --help, backtracks to run --help — 3x the tokens, 2 wasted turns. For niche CLIs where the model can't fall back on training knowledge, help text quality is the entire game.

But --help wasn't designed for agents

Progressive --help works, but it's freeform text designed for humans. There's no way to express workflows, common patterns, or “if you're trying to do X, start here.” And the AGENTS.md bootstrap hint is manual and unstandardized.

What if CLIs shipped a discovery mechanism designed for agents? A structured, progressive cli skills command that returns exactly what an agent needs at each level — with examples, workflows, and machine-readable output.

That's what we explore in Part 2: Designing a CLI Skills Protocol.

How efficient is your CLI for agents?

Evaluate your CLI with real LLM agents. Measure pass rate, turns, and token consumption per task.