# agent queries a database
$ psql -c "SELECT name, score FROM users ORDER BY score DESC LIMIT 3"
   name  | score
  -------+-------
   alice |    95
   bob   |    87

Can AI agents use psql?

The PostgreSQL interactive terminal. Agents use it to run queries, manage schemas, import/export data, and administer databases.

Docs →GitHub →

See the latest run →

50% overall pass rate1 model tested4 tasksvpsql (PostgreSQL) 16.13 (Ubuntu 16.13-1.pgdg24.04+1)3/6/2026

psql eval results by model

Model	Pass rate	Avg turns	Avg tokens
gpt-5-nano	50%	5.5	12.1k

psql task results by model

Task	gpt-5-nano
insert-and-querymedium Using psql, create a table called 'products' with columns id (serial), name (text), and price (numeric). Insert three products: 'Widget' at 9.99, 'Gadget' at 24.50, and 'Gizmo' at 14.75. Then query all products ordered by price descending and save the output to query-result.txt.	✓6t6 turns · 17.3k tokens
export-csvmedium Using psql, create a table 'scores' with columns name (text) and points (int). Insert 5 rows of sample data. Export the table to a CSV file called scores.csv using the COPY command or \copy.	✓5t5 turns · 12.6k tokens
create-tablemedium Using psql, create a table called 'users' with columns: id (serial primary key), name (varchar 100, not null), email (varchar 255, unique), and created_at (timestamp with default now()). Then verify the table exists by listing the table structure.	✗2t2 turns · 7.1k tokens
create-sql-scripteasy Create a SQL script file called 'setup.sql' that creates a 'tasks' table (id serial, title text not null, done boolean default false) and inserts 3 sample tasks. Then execute the script using psql.	✗5t5 turns · 11.5k tokens

Task suite source55 lines · YAML

- id: create-table
  intent: "Using psql, create a table called 'users' with columns: id (serial
    primary key), name (varchar 100, not null), email (varchar 255, unique), and
    created_at (timestamp with default now()). Then verify the table exists by
    listing the table structure."
  assert:
    - ran: psql.*CREATE TABLE|psql.*create table
    - ran: psql.*\\d|psql.*\\dt
  setup:
    - sudo -u postgres createdb bench_test 2>/dev/null || true
  max_turns: 5
  difficulty: medium
  category: schema
- id: insert-and-query
  intent: "Using psql, create a table called 'products' with columns id (serial),
    name (text), and price (numeric). Insert three products: 'Widget' at 9.99,
    'Gadget' at 24.50, and 'Gizmo' at 14.75. Then query all products ordered by
    price descending and save the output to query-result.txt."
  assert:
    - ran: psql.*INSERT|psql.*insert
    - ran: psql.*SELECT|psql.*select
    - file_exists: query-result.txt
  setup:
    - sudo -u postgres createdb bench_test2 2>/dev/null || true
  max_turns: 6
  difficulty: medium
  category: query
- id: export-csv
  intent: Using psql, create a table 'scores' with columns name (text) and points
    (int). Insert 5 rows of sample data. Export the table to a CSV file called
    scores.csv using the COPY command or \copy.
  assert:
    - ran: psql
    - file_exists: scores.csv
  setup:
    - sudo -u postgres createdb bench_test3 2>/dev/null || true
  max_turns: 6
  difficulty: medium
  category: export
- id: create-sql-script
  intent: Create a SQL script file called 'setup.sql' that creates a 'tasks' table
    (id serial, title text not null, done boolean default false) and inserts 3
    sample tasks. Then execute the script using psql.
  assert:
    - file_exists: setup.sql
    - file_contains:
        path: setup.sql
        text: CREATE TABLE
    - ran: psql.*setup.sql|psql.*-f
  setup:
    - sudo -u postgres createdb bench_test4 2>/dev/null || true
  max_turns: 5
  difficulty: easy
  category: workflow

Evals are a snapshot, not a verdict. We run identical tasks across all models to keep comparisons fair. Results vary with CLI version, task selection, and model updates. Evals run weekly on 4 tasks using @cliwatch/cli-bench.

What you get with CLIWatch

Everything below is running live for psql — see the latest run. Set up the same for your CLI in minutes.

Model	Pass Rate	Delta
Sonnet 4.5	95%	+5%
GPT-4.1	80%	-5%
Haiku 4.5	65%	-10%

CI & PR Comments

Get automated PR comments with per-model pass rates, regressions, and a link to the full comparison dashboard.

Pass rateLast 30 days

v1.0v1.6

Track Over Time

See how your CLI's agent compatibility changes across releases. Spot trends and regressions at a glance.

thresholds:
  claude-sonnet-4-5: 80%
  gpt-4.1: 75%
  claude-haiku-4-5: 60%

Quality Gates

Set per-model pass rate thresholds. CI fails if evals drop below your targets.

Get this for your CLI

Run evals in CI, get PR comments with regressions, track pass rates over time, and gate merges on quality thresholds — all from a single GitHub Actions workflow.

Start Free Read the guide

Compare other CLI evals

git

npm

aws

fly