CLI Intelligence
Your evals show pass/fail. Intelligence tells you why agents struggle with your CLI and exactly what to fix. AI-powered analysis of your benchmark traces, with projected impact for every recommendation.
Only 38% of first command attempts succeed, with agents consistently struggling with the plural subcommand name. Help text is effective once found (85% success after reading --help), indicating the CLI's functionality is well-documented but poorly surfaced through command naming.
How it works
Actionable Recommendations
Specific CLI changes ranked by severity. Not generic advice, but concrete fixes like "add a singular alias for 'checks'" with the exact frequency from your traces.
Projected Impact
See how your pass rate, discovery cost, and average turns would improve if you implemented each recommendation. Data-driven prioritization.
Built on Your Eval Data
Intelligence analyzes the traces from your actual benchmark runs. It sees what agents try, where they fail, and why. No synthetic data, no guesswork.
What Intelligence surfaces
Every insight is derived from your real agent traces. No synthetic benchmarks, no guesswork.
Command Chain Analysis
Agents chain list, then get, then show when a single describe could do it all. Intelligence surfaces composite command opportunities that cut agent turns.
Error Recovery Patterns
Agents hit an error, then try 3 flag variations before finding the right one. Intelligence identifies which error messages need better suggestions and where "did you mean?" prompts would eliminate retry loops.
Help Text Effectiveness
After reading --help, 85% of attempts succeed. But only 40% of agents try it first. Intelligence shows where discoverability gates are, and what to surface in error messages to short-circuit the help lookup.
PR Impact Preview
Renaming a subcommand? Intelligence predicts the impact: "This rename will break 3/12 agent tasks. Here's the predicted new pass rate." Regression prevention, not just detection.
Start with free evals today
Intelligence builds on your eval data. Start running benchmarks now and you'll be first in line when Intelligence launches.
Get Started Free