# agent extracts data from API response $ echo '[{"name":"a","score":95},{"name":"b","score":72}]' | jq '.[] | select(.score > 80) | .name' "a"
Can AI agents use jq?
A lightweight command-line JSON processor. Agents use it to parse, filter, transform, and format JSON data from APIs and files.
See the latest run →jq eval results by model
| Model | Pass rate | Avg turns | Avg tokens |
|---|---|---|---|
| gpt-5-nano | 88% | 1.9 | 5.8k |
jq task results by model
| Task | gpt-5-nano |
|---|---|
sort-keys-outputmedium Print the contents of bench-unordered.json with all object keys sorted alphabetically. | ✓1t |
quickstart-pretty-printeasy The file bench-raw.json contains minified JSON. Use jq to pretty-print it to stdout. | ✓1t |
quickstart-extract-fieldeasy Extract the 'city' field from the nested 'address' object in bench-person.json. Print just the string value. | ✓3t |
quickstart-array-firsteasy Get the first element from the JSON array in bench-items.json and print it. | ✓2t |
discover-versioneasy Check what version of jq is installed and print it. | ✓1t |
discover-compact-flageasy Print the contents of bench-data.json as a single compact line (no pretty-printing). | ✓1t |
config-null-input-constructmedium Without any input file, use jq to construct and print a JSON object with keys 'project' (value 'benchmark') and 'version' (value 1). Do not create any input files. | ✓1t |
error-optional-operatorhard The file bench-mixed.json contains an array with both objects and non-objects. Extract the 'name' field from each element without producing errors for non-object elements. | ✓3t |
config-arg-variablemedium Use jq's --arg feature to pass the value 'production' as a variable named 'env', then construct a JSON object {"environment": $env} and print it. Do not use any input file. | ✓1t |
config-from-file-filtermedium Write a jq filter to bench-filter.jq that selects objects where .status == "active", then run jq with that filter file against bench-users.json. | ✓5t |
raw-string-outputmedium Extract the 'name' field from bench-record.json and print it as a raw string (no JSON quotes around it). | ✓1t |
slurp-combine-inputsmedium There are three separate JSON files: bench-a.json, bench-b.json, bench-c.json. Use jq to slurp them all into a single JSON array and print it. | ✓1t |
error-exit-statushard Use jq with the exit-status flag to check bench-check.json. The filter should test if .enabled is true. Print the result and ensure jq exits with a non-zero status if the value is false. | ✗1t |
tutorial-extract-transformhard Read bench-commits.json (an array of commit objects). For each commit, extract just the 'message' and 'author' fields into a new object like {"msg": .message, "who": .author}. Print the resulting array. | ✗2t |
workflow-aggregate-reporthard Read bench-sales.json (an array of sale records with 'region' and 'amount' fields). Group the sales by region, then for each region compute the total amount. Print the result as an array of {"region": ..., "total": ...} objects. | ✓3t |
workflow-merge-and-enrichhard Merge data from two files: bench-products.json has [{"id": 1, "name": "Widget"}, ...] and bench-prices.json has {"1": 9.99, "2": 24.99, ...}. Produce an array where each product object gains a 'price' field looked up by its id. Write the result to bench-enriched.json. | ✓2t |
Task suite source249 lines · YAML
- id: quickstart-pretty-print
intent: The file bench-raw.json contains minified JSON. Use jq to pretty-print
it to stdout.
assert:
- ran: jq
- output_contains: Alice
- output_contains: scores
setup:
- echo '{"name":"Alice","scores":[90,85,92],"active":true}' > bench-raw.json
max_turns: 3
difficulty: easy
category: getting-started
docs_origin: docs/content/tutorial/default.yml#pretty-printing
- id: quickstart-extract-field
intent: Extract the 'city' field from the nested 'address' object in
bench-person.json. Print just the string value.
assert:
- ran: jq
- output_contains: Portland
setup:
- "echo '{\"name\": \"Alice\", \"address\": {\"city\": \"Portland\",
\"state\": \"OR\", \"zip\": \"97201\"}}' > bench-person.json"
max_turns: 3
difficulty: easy
category: getting-started
docs_origin: docs/content/manual/manual.yml#Object Identifier-Index
- id: quickstart-array-first
intent: Get the first element from the JSON array in bench-items.json and print it.
assert:
- ran: jq
- output_contains: Widget
setup:
- "echo '[{\"id\": 1, \"name\": \"Widget\"}, {\"id\": 2, \"name\":
\"Gadget\"}, {\"id\": 3, \"name\": \"Sprocket\"}]' > bench-items.json"
max_turns: 3
difficulty: easy
category: getting-started
docs_origin: docs/content/tutorial/default.yml#array-indexing
- id: discover-version
intent: Check what version of jq is installed and print it.
assert:
- ran: jq
- output_contains: jq-
setup: []
max_turns: 3
difficulty: easy
category: command-discovery
docs_origin: docs/content/manual/manual.yml#Invoking jq
- id: discover-compact-flag
intent: Print the contents of bench-data.json as a single compact line (no
pretty-printing).
assert:
- ran: jq
- output_contains: '"a":1'
setup:
- "echo '{\"a\": 1, \"b\": 2, \"c\": 3}' > bench-data.json"
max_turns: 4
difficulty: easy
category: command-discovery
docs_origin: docs/content/manual/manual.yml#Invoking jq
- id: config-null-input-construct
intent: Without any input file, use jq to construct and print a JSON object with
keys 'project' (value 'benchmark') and 'version' (value 1). Do not create
any input files.
assert:
- ran: jq
- output_contains: benchmark
- output_contains: version
setup: []
max_turns: 5
difficulty: medium
category: config
docs_origin: docs/content/manual/manual.yml#Invoking jq
- id: config-arg-variable
intent: "Use jq's --arg feature to pass the value 'production' as a variable
named 'env', then construct a JSON object {\"environment\": $env} and print
it. Do not use any input file."
assert:
- ran: jq
- output_contains: environment
- output_contains: production
setup: []
max_turns: 5
difficulty: medium
category: config
docs_origin: docs/content/manual/manual.yml#Invoking jq
- id: config-from-file-filter
intent: Write a jq filter to bench-filter.jq that selects objects where .status
== "active", then run jq with that filter file against bench-users.json.
assert:
- ran: jq
- file_exists: bench-filter.jq
- output_contains: Alice
- output_contains: Carol
setup:
- "echo '[{\"name\": \"Alice\", \"status\": \"active\"}, {\"name\": \"Bob\",
\"status\": \"inactive\"}, {\"name\": \"Carol\", \"status\": \"active\"}]'
> bench-users.json"
max_turns: 6
difficulty: medium
category: config
docs_origin: docs/content/manual/manual.yml#Invoking jq
- id: raw-string-output
intent: Extract the 'name' field from bench-record.json and print it as a raw
string (no JSON quotes around it).
assert:
- ran: jq
- output_contains: hello-world
setup:
- "echo '{\"name\": \"hello-world\", \"type\": \"greeting\"}' >
bench-record.json"
max_turns: 5
difficulty: medium
category: flag-parsing
docs_origin: docs/content/manual/manual.yml#Invoking jq
- id: slurp-combine-inputs
intent: "There are three separate JSON files: bench-a.json, bench-b.json,
bench-c.json. Use jq to slurp them all into a single JSON array and print
it."
assert:
- ran: jq
- output_contains: '"id": 1'
- output_contains: '"id": 3'
setup:
- "echo '{\"id\": 1}' > bench-a.json"
- "echo '{\"id\": 2}' > bench-b.json"
- "echo '{\"id\": 3}' > bench-c.json"
max_turns: 5
difficulty: medium
category: flag-parsing
docs_origin: docs/content/manual/manual.yml#Invoking jq
- id: sort-keys-output
intent: Print the contents of bench-unordered.json with all object keys sorted
alphabetically.
assert:
- ran: jq
- verify:
run: jq -S '.' bench-unordered.json
output_contains: apple
setup:
- "echo '{\"zebra\": 1, \"apple\": 2, \"mango\": 3, \"banana\": 4}' >
bench-unordered.json"
max_turns: 5
difficulty: medium
category: flag-parsing
docs_origin: docs/content/manual/manual.yml#Invoking jq
- id: error-optional-operator
intent: The file bench-mixed.json contains an array with both objects and
non-objects. Extract the 'name' field from each element without producing
errors for non-object elements.
assert:
- ran: jq
- output_contains: Alice
- output_contains: Bob
- output_contains: Carol
setup:
- "echo '[{\"name\": \"Alice\"}, 42, {\"name\": \"Bob\"}, null, {\"name\":
\"Carol\"}]' > bench-mixed.json"
max_turns: 6
difficulty: hard
category: error-recovery
docs_origin: docs/content/manual/manual.yml#Optional Object Identifier-Index
- id: error-exit-status
intent: Use jq with the exit-status flag to check bench-check.json. The filter
should test if .enabled is true. Print the result and ensure jq exits with a
non-zero status if the value is false.
assert:
- ran: jq
- output_contains: "false"
setup:
- "echo '{\"enabled\": false, \"name\": \"test\"}' > bench-check.json"
max_turns: 6
difficulty: hard
category: error-recovery
docs_origin: docs/content/manual/manual.yml#Invoking jq
- id: tutorial-extract-transform
intent: "Read bench-commits.json (an array of commit objects). For each commit,
extract just the 'message' and 'author' fields into a new object like
{\"msg\": .message, \"who\": .author}. Print the resulting array."
assert:
- ran: jq
- output_contains: msg
- output_contains: who
- output_contains: alice
- output_contains: resolve null pointer
setup:
- >
cat > bench-commits.json << 'EOF'
[
{"sha": "abc123", "message": "fix: resolve null pointer", "author": "alice", "date": "2025-01-15"},
{"sha": "def456", "message": "feat: add search endpoint", "author": "bob", "date": "2025-01-16"},
{"sha": "ghi789", "message": "docs: update README", "author": "carol", "date": "2025-01-17"}
]
EOF
max_turns: 8
difficulty: hard
category: multi-step-workflow
docs_origin: docs/content/tutorial/default.yml#object-construction
- id: workflow-aggregate-report
intent: "Read bench-sales.json (an array of sale records with 'region' and
'amount' fields). Group the sales by region, then for each region compute
the total amount. Print the result as an array of {\"region\": ...,
\"total\": ...} objects."
assert:
- ran: jq
- output_contains: west
- output_contains: east
- output_contains: "300"
- output_contains: "500"
setup:
- |
cat > bench-sales.json << 'EOF'
[
{"region": "west", "amount": 100},
{"region": "east", "amount": 200},
{"region": "west", "amount": 150},
{"region": "east", "amount": 300},
{"region": "west", "amount": 50}
]
EOF
max_turns: 10
difficulty: hard
category: multi-step-workflow
docs_origin: docs/content/manual/manual.yml#Builtin operators and functions
- id: workflow-merge-and-enrich
intent: "Merge data from two files: bench-products.json has [{\"id\": 1,
\"name\": \"Widget\"}, ...] and bench-prices.json has {\"1\": 9.99, \"2\":
24.99, ...}. Produce an array where each product object gains a 'price'
field looked up by its id. Write the result to bench-enriched.json."
assert:
- ran: jq
- file_exists: bench-enriched.json
- file_contains:
path: bench-enriched.json
text: Widget
- file_contains:
path: bench-enriched.json
text: "9.99"
setup:
- "echo '[{\"id\": 1, \"name\": \"Widget\"}, {\"id\": 2, \"name\":
\"Gadget\"}]' > bench-products.json"
- "echo '{\"1\": 9.99, \"2\": 24.99}' > bench-prices.json"
max_turns: 10
difficulty: hard
category: multi-step-workflow
docs_origin: docs/content/manual/manual.yml#Invoking jq
Evals are a snapshot, not a verdict. We run identical tasks across all models to keep comparisons fair. Results vary with CLI version, task selection, and model updates. Evals run weekly on 16 tasks using @cliwatch/cli-bench.
What you get with CLIWatch
Everything below is running live for jq — see the latest run. Set up the same for your CLI in minutes.
| Model | Pass Rate | Delta |
|---|---|---|
| Sonnet 4.5 | 95% | +5% |
| GPT-4.1 | 80% | -5% |
| Haiku 4.5 | 65% | -10% |
CI & PR Comments
Get automated PR comments with per-model pass rates, regressions, and a link to the full comparison dashboard.
Track Over Time
See how your CLI's agent compatibility changes across releases. Spot trends and regressions at a glance.
thresholds:
claude-sonnet-4-5: 80%
gpt-4.1: 75%
claude-haiku-4-5: 60%Quality Gates
Set per-model pass rate thresholds. CI fails if evals drop below your targets.
Get this for your CLI
Run evals in CI, get PR comments with regressions, track pass rates over time, and gate merges on quality thresholds — all from a single GitHub Actions workflow.