# agent tests an API endpoint $ curl -s https://httpbin.org/get | jq .origin "203.0.113.42" $ curl -w '%{http_code}' -o /dev/null -s https://example.com 200
Can AI agents use curl?
The universal command-line HTTP client. Agents use it to make API calls, download files, test endpoints, and debug HTTP traffic.
See the latest run →curl eval results by model
| Model | Pass rate | Avg turns | Avg tokens |
|---|---|---|---|
| gpt-5-nano | 89% | 2.8 | 7.9k |
curl task results by model
| Task | gpt-5-nano |
|---|---|
flags-json-postmedium Use curl's --json flag to POST the data '{"task": "benchmark", "score": 95}' to https://httpbin.org/post. Save the response to bench-json-result.json. | ✓3t |
flags-dump-header-separatemedium Use curl to fetch https://httpbin.org/response-headers?X-Bench-Tag=hello. Save the response body to bench-body.json and the response headers to a separate file bench-resp-headers.txt using the --dump-header flag. | ✗2t |
error-fail-on-http-errorhard Use curl with the --fail flag to request https://httpbin.org/status/404. Since it returns a 404, curl should exit with a non-zero exit code. Capture the exit code in a file called bench-exit-code.txt (just the number). | ✓2t |
error-timeout-handlinghard Use curl with a maximum time limit of 3 seconds to request https://httpbin.org/delay/1. Save the response to bench-timeout-ok.json. This should succeed since the delay (1s) is within the timeout (3s). | ✓3t |
error-retry-requesthard Use curl with --retry 3 to fetch https://httpbin.org/get. Save the output to bench-retry-result.json. Also use --write-out to append the HTTP status code to a file called bench-retry-status.txt. | ✓2t |
workflow-cookie-sessionhard Simulate a session with cookies. First, use curl to visit https://httpbin.org/cookies/set/bench_session/abc123 with -L to follow the redirect, saving cookies to bench-cookies.txt. Then make a second request to https://httpbin.org/cookies using those saved cookies, and save the response to bench-cookie-check.json. | ✓3t |
quickstart-simple-geteasy Use curl to fetch the page at https://httpbin.org/get and print the response to stdout. | ✓1t |
quickstart-save-to-fileeasy Use curl to download the JSON response from https://httpbin.org/get and save it to a file called bench-response.json. | ✓3t |
quickstart-verbose-requesteasy Use curl in verbose mode to fetch https://httpbin.org/get. The verbose output should show the request and response headers. | ✓2t |
discover-version-protocolseasy Check what version of curl is installed and what protocols it supports. Print the full version output. | ✓1t |
discover-help-categorieseasy Show the curl help categories. curl organizes its flags by category (e.g., http, ftp, tls). List the available categories. | ✓4t |
config-file-requestmedium Create a curl config file called bench-curl.conf that sets the User-Agent to 'BenchBot/1.0', enables silent mode, and follows redirects. Then use curl with that config file to fetch https://httpbin.org/user-agent and save the output to bench-agent.json. | ✓3t |
config-custom-headersmedium Use curl to send a GET request to https://httpbin.org/headers with three custom headers: X-Request-ID set to 'bench-123', Accept set to 'application/json', and X-Custom-Auth set to 'token-abc'. Save the response to bench-headers.json. | ✗2t |
flags-head-requestmedium Use curl to send a HEAD request to https://httpbin.org/get (fetching only the response headers, not the body). Save the headers to a file called bench-head-headers.txt. | ✓2t |
flags-write-out-statusmedium Use curl's --write-out (or -w) option to fetch https://httpbin.org/status/201 silently and print only the HTTP status code to stdout. Do not print the response body. | ✓1t |
workflow-follow-and-measurehard Use curl to follow redirects from https://httpbin.org/redirect/3 silently. Use --write-out to capture the total time, number of redirects, and final HTTP status code. Write these three values (one per line, labeled) to bench-metrics.txt. Also save the final response body to bench-final.json. | ✓7t |
workflow-post-then-verifyhard First, POST JSON data '{"id": 42, "name": "bench-item"}' to https://httpbin.org/post using curl and save the response to bench-post-response.json. Then extract the 'id' value from the response (the server echoes it back in the 'json' field) and verify it equals 42 by writing 'PASS' or 'FAIL' to bench-verify.txt. | ✓4t |
workflow-form-uploadhard Create a text file bench-upload.txt with the content 'Hello from CLIWatch'. Then use curl to upload it as a multipart form field named 'file' to https://httpbin.org/post. Save the response to bench-upload-result.json, which should contain the file contents echoed back. | ✓3t |
Task suite source267 lines · YAML
- id: quickstart-simple-get
intent: Use curl to fetch the page at https://httpbin.org/get and print the
response to stdout.
assert:
- ran: curl
- output_contains: httpbin.org
setup: []
max_turns: 3
difficulty: easy
category: getting-started
docs_origin: docs/MANUAL.md#Simple Usage
- id: quickstart-save-to-file
intent: Use curl to download the JSON response from https://httpbin.org/get and
save it to a file called bench-response.json.
assert:
- ran: curl
- file_exists: bench-response.json
- file_contains:
path: bench-response.json
text: httpbin.org
setup: []
max_turns: 3
difficulty: easy
category: getting-started
docs_origin: docs/MANUAL.md#Download to a File
- id: quickstart-verbose-request
intent: Use curl in verbose mode to fetch https://httpbin.org/get. The verbose
output should show the request and response headers.
assert:
- ran: curl.*-v|curl.*--verbose
setup: []
max_turns: 3
difficulty: easy
category: getting-started
docs_origin: docs/MANUAL.md#Verbose / Debug
- id: discover-version-protocols
intent: Check what version of curl is installed and what protocols it supports.
Print the full version output.
assert:
- ran: curl.*--version|-V
- output_contains: curl
- output_contains: http
setup: []
max_turns: 3
difficulty: easy
category: command-discovery
docs_origin: docs/MANUAL.md#Simple Usage
- id: discover-help-categories
intent: Show the curl help categories. curl organizes its flags by category
(e.g., http, ftp, tls). List the available categories.
assert:
- ran: curl.*--help|curl.*-h
setup: []
max_turns: 4
difficulty: easy
category: command-discovery
docs_origin: docs/MANUAL.md#Simple Usage
- id: config-file-request
intent: Create a curl config file called bench-curl.conf that sets the
User-Agent to 'BenchBot/1.0', enables silent mode, and follows redirects.
Then use curl with that config file to fetch https://httpbin.org/user-agent
and save the output to bench-agent.json.
assert:
- ran: curl
- file_exists: bench-curl.conf
- file_exists: bench-agent.json
- file_contains:
path: bench-agent.json
text: BenchBot
setup: []
max_turns: 5
difficulty: medium
category: config
docs_origin: docs/MANUAL.md#Simple Usage
- id: config-custom-headers
intent: "Use curl to send a GET request to https://httpbin.org/headers with
three custom headers: X-Request-ID set to 'bench-123', Accept set to
'application/json', and X-Custom-Auth set to 'token-abc'. Save the response
to bench-headers.json."
assert:
- ran: curl
- file_exists: bench-headers.json
- file_contains:
path: bench-headers.json
text: bench-123
- file_contains:
path: bench-headers.json
text: token-abc
setup: []
max_turns: 5
difficulty: medium
category: config
docs_origin: docs/MANUAL.md#HTTP
- id: flags-head-request
intent: Use curl to send a HEAD request to https://httpbin.org/get (fetching
only the response headers, not the body). Save the headers to a file called
bench-head-headers.txt.
assert:
- ran: curl.*-I|curl.*--head|curl.*-D
- file_exists: bench-head-headers.txt
- file_contains:
path: bench-head-headers.txt
text: "200"
setup: []
max_turns: 5
difficulty: medium
category: flag-parsing
docs_origin: docs/MANUAL.md#Detailed Information
- id: flags-write-out-status
intent: Use curl's --write-out (or -w) option to fetch
https://httpbin.org/status/201 silently and print only the HTTP status code
to stdout. Do not print the response body.
assert:
- ran: curl.*-w|curl.*--write-out
- output_contains: "201"
setup: []
max_turns: 5
difficulty: medium
category: flag-parsing
docs_origin: docs/MANUAL.md#Detailed Information
- id: flags-json-post
intent: "Use curl's --json flag to POST the data '{\"task\": \"benchmark\",
\"score\": 95}' to https://httpbin.org/post. Save the response to
bench-json-result.json."
assert:
- ran: curl.*--json|curl.*-d.*application/json
- file_exists: bench-json-result.json
- file_contains:
path: bench-json-result.json
text: benchmark
setup: []
max_turns: 5
difficulty: medium
category: flag-parsing
docs_origin: docs/MANUAL.md#POST (HTTP)
- id: flags-dump-header-separate
intent: Use curl to fetch
https://httpbin.org/response-headers?X-Bench-Tag=hello. Save the response
body to bench-body.json and the response headers to a separate file
bench-resp-headers.txt using the --dump-header flag.
assert:
- ran: curl.*-D|curl.*--dump-header
- file_exists: bench-body.json
- file_exists: bench-resp-headers.txt
- file_contains:
path: bench-resp-headers.txt
text: X-Bench-Tag
setup: []
max_turns: 5
difficulty: medium
category: flag-parsing
docs_origin: docs/MANUAL.md#Detailed Information
- id: error-fail-on-http-error
intent: Use curl with the --fail flag to request https://httpbin.org/status/404.
Since it returns a 404, curl should exit with a non-zero exit code. Capture
the exit code in a file called bench-exit-code.txt (just the number).
assert:
- ran: curl.*-f|curl.*--fail
- file_exists: bench-exit-code.txt
- file_contains:
path: bench-exit-code.txt
text: "22"
setup: []
max_turns: 6
difficulty: hard
category: error-recovery
docs_origin: docs/MANUAL.md#Verbose / Debug
- id: error-timeout-handling
intent: Use curl with a maximum time limit of 3 seconds to request
https://httpbin.org/delay/1. Save the response to bench-timeout-ok.json.
This should succeed since the delay (1s) is within the timeout (3s).
assert:
- ran: curl.*--max-time|curl.*-m
- file_exists: bench-timeout-ok.json
- file_contains:
path: bench-timeout-ok.json
text: httpbin.org
setup: []
max_turns: 6
difficulty: hard
category: error-recovery
docs_origin: docs/MANUAL.md#Simple Usage
- id: error-retry-request
intent: Use curl with --retry 3 to fetch https://httpbin.org/get. Save the
output to bench-retry-result.json. Also use --write-out to append the HTTP
status code to a file called bench-retry-status.txt.
assert:
- ran: curl.*--retry
- file_exists: bench-retry-result.json
- file_exists: bench-retry-status.txt
- file_contains:
path: bench-retry-status.txt
text: "200"
setup: []
max_turns: 6
difficulty: hard
category: error-recovery
docs_origin: docs/MANUAL.md#Simple Usage
- id: workflow-cookie-session
intent: Simulate a session with cookies. First, use curl to visit
https://httpbin.org/cookies/set/bench_session/abc123 with -L to follow the
redirect, saving cookies to bench-cookies.txt. Then make a second request to
https://httpbin.org/cookies using those saved cookies, and save the response
to bench-cookie-check.json.
assert:
- ran: curl
- file_exists: bench-cookies.txt
- file_exists: bench-cookie-check.json
- file_contains:
path: bench-cookie-check.json
text: bench_session
setup: []
max_turns: 8
difficulty: hard
category: multi-step-workflow
docs_origin: docs/MANUAL.md#HTTP Cookies
- id: workflow-follow-and-measure
intent: Use curl to follow redirects from https://httpbin.org/redirect/3
silently. Use --write-out to capture the total time, number of redirects,
and final HTTP status code. Write these three values (one per line, labeled)
to bench-metrics.txt. Also save the final response body to bench-final.json.
assert:
- ran: curl.*-L|curl.*--location
- ran: curl.*-w|curl.*--write-out
- file_exists: bench-metrics.txt
- file_exists: bench-final.json
setup: []
max_turns: 10
difficulty: hard
category: multi-step-workflow
docs_origin: docs/MANUAL.md#Detailed Information
- id: workflow-post-then-verify
intent: "First, POST JSON data '{\"id\": 42, \"name\": \"bench-item\"}' to
https://httpbin.org/post using curl and save the response to
bench-post-response.json. Then extract the 'id' value from the response (the
server echoes it back in the 'json' field) and verify it equals 42 by
writing 'PASS' or 'FAIL' to bench-verify.txt."
assert:
- ran: curl
- file_exists: bench-post-response.json
- file_exists: bench-verify.txt
- file_contains:
path: bench-verify.txt
text: PASS
setup: []
max_turns: 10
difficulty: hard
category: multi-step-workflow
docs_origin: docs/MANUAL.md#POST (HTTP)
- id: workflow-form-upload
intent: Create a text file bench-upload.txt with the content 'Hello from
CLIWatch'. Then use curl to upload it as a multipart form field named 'file'
to https://httpbin.org/post. Save the response to bench-upload-result.json,
which should contain the file contents echoed back.
assert:
- ran: curl.*-F|curl.*--form
- file_exists: bench-upload.txt
- file_exists: bench-upload-result.json
- file_contains:
path: bench-upload-result.json
text: Hello from CLIWatch
setup: []
max_turns: 10
difficulty: hard
category: multi-step-workflow
docs_origin: docs/MANUAL.md#POST (HTTP)
Evals are a snapshot, not a verdict. We run identical tasks across all models to keep comparisons fair. Results vary with CLI version, task selection, and model updates. Evals run weekly on 18 tasks using @cliwatch/cli-bench.
What you get with CLIWatch
Everything below is running live for curl — see the latest run. Set up the same for your CLI in minutes.
| Model | Pass Rate | Delta |
|---|---|---|
| Sonnet 4.5 | 95% | +5% |
| GPT-4.1 | 80% | -5% |
| Haiku 4.5 | 65% | -10% |
CI & PR Comments
Get automated PR comments with per-model pass rates, regressions, and a link to the full comparison dashboard.
Track Over Time
See how your CLI's agent compatibility changes across releases. Spot trends and regressions at a glance.
thresholds:
claude-sonnet-4-5: 80%
gpt-4.1: 75%
claude-haiku-4-5: 60%Quality Gates
Set per-model pass rate thresholds. CI fails if evals drop below your targets.
Get this for your CLI
Run evals in CI, get PR comments with regressions, track pass rates over time, and gate merges on quality thresholds — all from a single GitHub Actions workflow.