# agent tests an API endpoint
$ curl -s https://httpbin.org/get | jq .origin
  "203.0.113.42"
 
$ curl -w '%{http_code}' -o /dev/null -s https://example.com
  200

Can AI agents use curl?

The universal command-line HTTP client. Agents use it to make API calls, download files, test endpoints, and debug HTTP traffic.

See the latest run →
89% overall pass rate1 model tested18 tasksv8.5.03/6/2026

curl eval results by model

ModelPass rateAvg turnsAvg tokens
gpt-5-nano89%2.87.9k

curl task results by model

Taskgpt-5-nano
flags-json-postmedium
Use curl's --json flag to POST the data '{"task": "benchmark", "score": 95}' to https://httpbin.org/post. Save the response to bench-json-result.json.
3t
flags-dump-header-separatemedium
Use curl to fetch https://httpbin.org/response-headers?X-Bench-Tag=hello. Save the response body to bench-body.json and the response headers to a separate file bench-resp-headers.txt using the --dump-header flag.
2t
error-fail-on-http-errorhard
Use curl with the --fail flag to request https://httpbin.org/status/404. Since it returns a 404, curl should exit with a non-zero exit code. Capture the exit code in a file called bench-exit-code.txt (just the number).
2t
error-timeout-handlinghard
Use curl with a maximum time limit of 3 seconds to request https://httpbin.org/delay/1. Save the response to bench-timeout-ok.json. This should succeed since the delay (1s) is within the timeout (3s).
3t
error-retry-requesthard
Use curl with --retry 3 to fetch https://httpbin.org/get. Save the output to bench-retry-result.json. Also use --write-out to append the HTTP status code to a file called bench-retry-status.txt.
2t
workflow-cookie-sessionhard
Simulate a session with cookies. First, use curl to visit https://httpbin.org/cookies/set/bench_session/abc123 with -L to follow the redirect, saving cookies to bench-cookies.txt. Then make a second request to https://httpbin.org/cookies using those saved cookies, and save the response to bench-cookie-check.json.
3t
quickstart-simple-geteasy
Use curl to fetch the page at https://httpbin.org/get and print the response to stdout.
1t
quickstart-save-to-fileeasy
Use curl to download the JSON response from https://httpbin.org/get and save it to a file called bench-response.json.
3t
quickstart-verbose-requesteasy
Use curl in verbose mode to fetch https://httpbin.org/get. The verbose output should show the request and response headers.
2t
discover-version-protocolseasy
Check what version of curl is installed and what protocols it supports. Print the full version output.
1t
discover-help-categorieseasy
Show the curl help categories. curl organizes its flags by category (e.g., http, ftp, tls). List the available categories.
4t
config-file-requestmedium
Create a curl config file called bench-curl.conf that sets the User-Agent to 'BenchBot/1.0', enables silent mode, and follows redirects. Then use curl with that config file to fetch https://httpbin.org/user-agent and save the output to bench-agent.json.
3t
config-custom-headersmedium
Use curl to send a GET request to https://httpbin.org/headers with three custom headers: X-Request-ID set to 'bench-123', Accept set to 'application/json', and X-Custom-Auth set to 'token-abc'. Save the response to bench-headers.json.
2t
flags-head-requestmedium
Use curl to send a HEAD request to https://httpbin.org/get (fetching only the response headers, not the body). Save the headers to a file called bench-head-headers.txt.
2t
flags-write-out-statusmedium
Use curl's --write-out (or -w) option to fetch https://httpbin.org/status/201 silently and print only the HTTP status code to stdout. Do not print the response body.
1t
workflow-follow-and-measurehard
Use curl to follow redirects from https://httpbin.org/redirect/3 silently. Use --write-out to capture the total time, number of redirects, and final HTTP status code. Write these three values (one per line, labeled) to bench-metrics.txt. Also save the final response body to bench-final.json.
7t
workflow-post-then-verifyhard
First, POST JSON data '{"id": 42, "name": "bench-item"}' to https://httpbin.org/post using curl and save the response to bench-post-response.json. Then extract the 'id' value from the response (the server echoes it back in the 'json' field) and verify it equals 42 by writing 'PASS' or 'FAIL' to bench-verify.txt.
4t
workflow-form-uploadhard
Create a text file bench-upload.txt with the content 'Hello from CLIWatch'. Then use curl to upload it as a multipart form field named 'file' to https://httpbin.org/post. Save the response to bench-upload-result.json, which should contain the file contents echoed back.
3t
Task suite source267 lines · YAML
- id: quickstart-simple-get
  intent: Use curl to fetch the page at https://httpbin.org/get and print the
    response to stdout.
  assert:
    - ran: curl
    - output_contains: httpbin.org
  setup: []
  max_turns: 3
  difficulty: easy
  category: getting-started
  docs_origin: docs/MANUAL.md#Simple Usage
- id: quickstart-save-to-file
  intent: Use curl to download the JSON response from https://httpbin.org/get and
    save it to a file called bench-response.json.
  assert:
    - ran: curl
    - file_exists: bench-response.json
    - file_contains:
        path: bench-response.json
        text: httpbin.org
  setup: []
  max_turns: 3
  difficulty: easy
  category: getting-started
  docs_origin: docs/MANUAL.md#Download to a File
- id: quickstart-verbose-request
  intent: Use curl in verbose mode to fetch https://httpbin.org/get. The verbose
    output should show the request and response headers.
  assert:
    - ran: curl.*-v|curl.*--verbose
  setup: []
  max_turns: 3
  difficulty: easy
  category: getting-started
  docs_origin: docs/MANUAL.md#Verbose / Debug
- id: discover-version-protocols
  intent: Check what version of curl is installed and what protocols it supports.
    Print the full version output.
  assert:
    - ran: curl.*--version|-V
    - output_contains: curl
    - output_contains: http
  setup: []
  max_turns: 3
  difficulty: easy
  category: command-discovery
  docs_origin: docs/MANUAL.md#Simple Usage
- id: discover-help-categories
  intent: Show the curl help categories. curl organizes its flags by category
    (e.g., http, ftp, tls). List the available categories.
  assert:
    - ran: curl.*--help|curl.*-h
  setup: []
  max_turns: 4
  difficulty: easy
  category: command-discovery
  docs_origin: docs/MANUAL.md#Simple Usage
- id: config-file-request
  intent: Create a curl config file called bench-curl.conf that sets the
    User-Agent to 'BenchBot/1.0', enables silent mode, and follows redirects.
    Then use curl with that config file to fetch https://httpbin.org/user-agent
    and save the output to bench-agent.json.
  assert:
    - ran: curl
    - file_exists: bench-curl.conf
    - file_exists: bench-agent.json
    - file_contains:
        path: bench-agent.json
        text: BenchBot
  setup: []
  max_turns: 5
  difficulty: medium
  category: config
  docs_origin: docs/MANUAL.md#Simple Usage
- id: config-custom-headers
  intent: "Use curl to send a GET request to https://httpbin.org/headers with
    three custom headers: X-Request-ID set to 'bench-123', Accept set to
    'application/json', and X-Custom-Auth set to 'token-abc'. Save the response
    to bench-headers.json."
  assert:
    - ran: curl
    - file_exists: bench-headers.json
    - file_contains:
        path: bench-headers.json
        text: bench-123
    - file_contains:
        path: bench-headers.json
        text: token-abc
  setup: []
  max_turns: 5
  difficulty: medium
  category: config
  docs_origin: docs/MANUAL.md#HTTP
- id: flags-head-request
  intent: Use curl to send a HEAD request to https://httpbin.org/get (fetching
    only the response headers, not the body). Save the headers to a file called
    bench-head-headers.txt.
  assert:
    - ran: curl.*-I|curl.*--head|curl.*-D
    - file_exists: bench-head-headers.txt
    - file_contains:
        path: bench-head-headers.txt
        text: "200"
  setup: []
  max_turns: 5
  difficulty: medium
  category: flag-parsing
  docs_origin: docs/MANUAL.md#Detailed Information
- id: flags-write-out-status
  intent: Use curl's --write-out (or -w) option to fetch
    https://httpbin.org/status/201 silently and print only the HTTP status code
    to stdout. Do not print the response body.
  assert:
    - ran: curl.*-w|curl.*--write-out
    - output_contains: "201"
  setup: []
  max_turns: 5
  difficulty: medium
  category: flag-parsing
  docs_origin: docs/MANUAL.md#Detailed Information
- id: flags-json-post
  intent: "Use curl's --json flag to POST the data '{\"task\": \"benchmark\",
    \"score\": 95}' to https://httpbin.org/post. Save the response to
    bench-json-result.json."
  assert:
    - ran: curl.*--json|curl.*-d.*application/json
    - file_exists: bench-json-result.json
    - file_contains:
        path: bench-json-result.json
        text: benchmark
  setup: []
  max_turns: 5
  difficulty: medium
  category: flag-parsing
  docs_origin: docs/MANUAL.md#POST (HTTP)
- id: flags-dump-header-separate
  intent: Use curl to fetch
    https://httpbin.org/response-headers?X-Bench-Tag=hello. Save the response
    body to bench-body.json and the response headers to a separate file
    bench-resp-headers.txt using the --dump-header flag.
  assert:
    - ran: curl.*-D|curl.*--dump-header
    - file_exists: bench-body.json
    - file_exists: bench-resp-headers.txt
    - file_contains:
        path: bench-resp-headers.txt
        text: X-Bench-Tag
  setup: []
  max_turns: 5
  difficulty: medium
  category: flag-parsing
  docs_origin: docs/MANUAL.md#Detailed Information
- id: error-fail-on-http-error
  intent: Use curl with the --fail flag to request https://httpbin.org/status/404.
    Since it returns a 404, curl should exit with a non-zero exit code. Capture
    the exit code in a file called bench-exit-code.txt (just the number).
  assert:
    - ran: curl.*-f|curl.*--fail
    - file_exists: bench-exit-code.txt
    - file_contains:
        path: bench-exit-code.txt
        text: "22"
  setup: []
  max_turns: 6
  difficulty: hard
  category: error-recovery
  docs_origin: docs/MANUAL.md#Verbose / Debug
- id: error-timeout-handling
  intent: Use curl with a maximum time limit of 3 seconds to request
    https://httpbin.org/delay/1. Save the response to bench-timeout-ok.json.
    This should succeed since the delay (1s) is within the timeout (3s).
  assert:
    - ran: curl.*--max-time|curl.*-m
    - file_exists: bench-timeout-ok.json
    - file_contains:
        path: bench-timeout-ok.json
        text: httpbin.org
  setup: []
  max_turns: 6
  difficulty: hard
  category: error-recovery
  docs_origin: docs/MANUAL.md#Simple Usage
- id: error-retry-request
  intent: Use curl with --retry 3 to fetch https://httpbin.org/get. Save the
    output to bench-retry-result.json. Also use --write-out to append the HTTP
    status code to a file called bench-retry-status.txt.
  assert:
    - ran: curl.*--retry
    - file_exists: bench-retry-result.json
    - file_exists: bench-retry-status.txt
    - file_contains:
        path: bench-retry-status.txt
        text: "200"
  setup: []
  max_turns: 6
  difficulty: hard
  category: error-recovery
  docs_origin: docs/MANUAL.md#Simple Usage
- id: workflow-cookie-session
  intent: Simulate a session with cookies. First, use curl to visit
    https://httpbin.org/cookies/set/bench_session/abc123 with -L to follow the
    redirect, saving cookies to bench-cookies.txt. Then make a second request to
    https://httpbin.org/cookies using those saved cookies, and save the response
    to bench-cookie-check.json.
  assert:
    - ran: curl
    - file_exists: bench-cookies.txt
    - file_exists: bench-cookie-check.json
    - file_contains:
        path: bench-cookie-check.json
        text: bench_session
  setup: []
  max_turns: 8
  difficulty: hard
  category: multi-step-workflow
  docs_origin: docs/MANUAL.md#HTTP Cookies
- id: workflow-follow-and-measure
  intent: Use curl to follow redirects from https://httpbin.org/redirect/3
    silently. Use --write-out to capture the total time, number of redirects,
    and final HTTP status code. Write these three values (one per line, labeled)
    to bench-metrics.txt. Also save the final response body to bench-final.json.
  assert:
    - ran: curl.*-L|curl.*--location
    - ran: curl.*-w|curl.*--write-out
    - file_exists: bench-metrics.txt
    - file_exists: bench-final.json
  setup: []
  max_turns: 10
  difficulty: hard
  category: multi-step-workflow
  docs_origin: docs/MANUAL.md#Detailed Information
- id: workflow-post-then-verify
  intent: "First, POST JSON data '{\"id\": 42, \"name\": \"bench-item\"}' to
    https://httpbin.org/post using curl and save the response to
    bench-post-response.json. Then extract the 'id' value from the response (the
    server echoes it back in the 'json' field) and verify it equals 42 by
    writing 'PASS' or 'FAIL' to bench-verify.txt."
  assert:
    - ran: curl
    - file_exists: bench-post-response.json
    - file_exists: bench-verify.txt
    - file_contains:
        path: bench-verify.txt
        text: PASS
  setup: []
  max_turns: 10
  difficulty: hard
  category: multi-step-workflow
  docs_origin: docs/MANUAL.md#POST (HTTP)
- id: workflow-form-upload
  intent: Create a text file bench-upload.txt with the content 'Hello from
    CLIWatch'. Then use curl to upload it as a multipart form field named 'file'
    to https://httpbin.org/post. Save the response to bench-upload-result.json,
    which should contain the file contents echoed back.
  assert:
    - ran: curl.*-F|curl.*--form
    - file_exists: bench-upload.txt
    - file_exists: bench-upload-result.json
    - file_contains:
        path: bench-upload-result.json
        text: Hello from CLIWatch
  setup: []
  max_turns: 10
  difficulty: hard
  category: multi-step-workflow
  docs_origin: docs/MANUAL.md#POST (HTTP)

Evals are a snapshot, not a verdict. We run identical tasks across all models to keep comparisons fair. Results vary with CLI version, task selection, and model updates. Evals run weekly on 18 tasks using @cliwatch/cli-bench.

What you get with CLIWatch

Everything below is running live for curl see the latest run. Set up the same for your CLI in minutes.

ModelPass RateDelta
Sonnet 4.595%+5%
GPT-4.180%-5%
Haiku 4.565%-10%

CI & PR Comments

Get automated PR comments with per-model pass rates, regressions, and a link to the full comparison dashboard.

Pass rateLast 30 days
v1.0v1.6

Track Over Time

See how your CLI's agent compatibility changes across releases. Spot trends and regressions at a glance.

thresholds:
  claude-sonnet-4-5: 80%
  gpt-4.1: 75%
  claude-haiku-4-5: 60%

Quality Gates

Set per-model pass rate thresholds. CI fails if evals drop below your targets.

Get this for your CLI

Run evals in CI, get PR comments with regressions, track pass rates over time, and gate merges on quality thresholds — all from a single GitHub Actions workflow.

Compare other CLI evals