# agent queries a database
$ psql -c "SELECT name, score FROM users ORDER BY score DESC LIMIT 3"
   name  | score
  -------+-------
   alice |    95
   bob   |    87

Can AI agents use psql?

The PostgreSQL interactive terminal. Agents use it to run queries, manage schemas, import/export data, and administer databases.

See the latest run →
50% overall pass rate1 model tested4 tasksvpsql (PostgreSQL) 16.13 (Ubuntu 16.13-1.pgdg24.04+1)3/6/2026

psql eval results by model

ModelPass rateAvg turnsAvg tokens
gpt-5-nano50%5.512.1k

psql task results by model

Taskgpt-5-nano
insert-and-querymedium
Using psql, create a table called 'products' with columns id (serial), name (text), and price (numeric). Insert three products: 'Widget' at 9.99, 'Gadget' at 24.50, and 'Gizmo' at 14.75. Then query all products ordered by price descending and save the output to query-result.txt.
6t
export-csvmedium
Using psql, create a table 'scores' with columns name (text) and points (int). Insert 5 rows of sample data. Export the table to a CSV file called scores.csv using the COPY command or \copy.
5t
create-tablemedium
Using psql, create a table called 'users' with columns: id (serial primary key), name (varchar 100, not null), email (varchar 255, unique), and created_at (timestamp with default now()). Then verify the table exists by listing the table structure.
2t
create-sql-scripteasy
Create a SQL script file called 'setup.sql' that creates a 'tasks' table (id serial, title text not null, done boolean default false) and inserts 3 sample tasks. Then execute the script using psql.
5t
Task suite source55 lines · YAML
- id: create-table
  intent: "Using psql, create a table called 'users' with columns: id (serial
    primary key), name (varchar 100, not null), email (varchar 255, unique), and
    created_at (timestamp with default now()). Then verify the table exists by
    listing the table structure."
  assert:
    - ran: psql.*CREATE TABLE|psql.*create table
    - ran: psql.*\\d|psql.*\\dt
  setup:
    - sudo -u postgres createdb bench_test 2>/dev/null || true
  max_turns: 5
  difficulty: medium
  category: schema
- id: insert-and-query
  intent: "Using psql, create a table called 'products' with columns id (serial),
    name (text), and price (numeric). Insert three products: 'Widget' at 9.99,
    'Gadget' at 24.50, and 'Gizmo' at 14.75. Then query all products ordered by
    price descending and save the output to query-result.txt."
  assert:
    - ran: psql.*INSERT|psql.*insert
    - ran: psql.*SELECT|psql.*select
    - file_exists: query-result.txt
  setup:
    - sudo -u postgres createdb bench_test2 2>/dev/null || true
  max_turns: 6
  difficulty: medium
  category: query
- id: export-csv
  intent: Using psql, create a table 'scores' with columns name (text) and points
    (int). Insert 5 rows of sample data. Export the table to a CSV file called
    scores.csv using the COPY command or \copy.
  assert:
    - ran: psql
    - file_exists: scores.csv
  setup:
    - sudo -u postgres createdb bench_test3 2>/dev/null || true
  max_turns: 6
  difficulty: medium
  category: export
- id: create-sql-script
  intent: Create a SQL script file called 'setup.sql' that creates a 'tasks' table
    (id serial, title text not null, done boolean default false) and inserts 3
    sample tasks. Then execute the script using psql.
  assert:
    - file_exists: setup.sql
    - file_contains:
        path: setup.sql
        text: CREATE TABLE
    - ran: psql.*setup.sql|psql.*-f
  setup:
    - sudo -u postgres createdb bench_test4 2>/dev/null || true
  max_turns: 5
  difficulty: easy
  category: workflow

Evals are a snapshot, not a verdict. We run identical tasks across all models to keep comparisons fair. Results vary with CLI version, task selection, and model updates. Evals run weekly on 4 tasks using @cliwatch/cli-bench.

What you get with CLIWatch

Everything below is running live for psql see the latest run. Set up the same for your CLI in minutes.

ModelPass RateDelta
Sonnet 4.595%+5%
GPT-4.180%-5%
Haiku 4.565%-10%

CI & PR Comments

Get automated PR comments with per-model pass rates, regressions, and a link to the full comparison dashboard.

Pass rateLast 30 days
v1.0v1.6

Track Over Time

See how your CLI's agent compatibility changes across releases. Spot trends and regressions at a glance.

thresholds:
  claude-sonnet-4-5: 80%
  gpt-4.1: 75%
  claude-haiku-4-5: 60%

Quality Gates

Set per-model pass rate thresholds. CI fails if evals drop below your targets.

Get this for your CLI

Run evals in CI, get PR comments with regressions, track pass rates over time, and gate merges on quality thresholds — all from a single GitHub Actions workflow.

Compare other CLI evals