# agent deploys to production $ vercel --prod --yes Deploying to production... https://app.example.com $ vercel ls --json [{"url":"app-abc123.vercel.app","state":"READY"}]
Can AI agents use Vercel?
Frontend deployment platform CLI. Agents deploy projects, manage environment variables, and configure domains.
See the latest run →Vercel eval results by model
| Model | Pass rate | Avg turns | Avg tokens |
|---|---|---|---|
| gpt-5-nano | 50% | 2.4 | 11.4k |
Vercel task results by model
| Task | gpt-5-nano |
|---|---|
show-helpeasy Show the top-level Vercel CLI help to see all available commands. | ✓3t |
env-fileeasy Create a .env.local file with DATABASE_URL=postgres://localhost:5432/mydb and API_KEY=test123. Also create a .vercel/.gitignore that ignores everything in the .vercel directory. | ✓4t |
version-checkeasy Display the installed Vercel CLI version. | ✓1t |
subcommand-helpeasy Show the help for the 'vercel env' subcommand to see how to manage environment variables. | ✗3t |
create-vercel-jsonmedium Create a vercel.json configuration file that sets the build command to 'npm run build', the output directory to 'dist', and adds a rewrite rule that sends all requests to /index.html. | ✓2t |
create-project-structuremedium Create a minimal Next.js project structure: package.json (with next as dependency and build/dev scripts), next.config.js, and pages/index.js that exports a simple component. | ✗6t |
headers-configmedium Create a vercel.json with custom headers: add Cache-Control 'public, max-age=31536000' for all files under /static/* and X-Frame-Options 'DENY' for all routes. | ✓5t |
redirects-configmedium Create a vercel.json with redirect rules: /blog/:slug to /posts/:slug (308 permanent) and /old-page to /new-page (307 temporary). | ✗2t |
cron-configmedium Create a vercel.json that configures a cron job to hit /api/cleanup every day at midnight UTC. | ✗1t |
framework-detectionmedium Create a minimal Remix project structure with package.json (remix dependencies), remix.config.js, and app/root.tsx so that Vercel would auto-detect the Remix framework. | ✗6t |
deploy-no-authhard Attempt to run 'vercel deploy' without authentication. Report the error message you get back. | ✓1t |
env-pull-no-projecthard Attempt to run 'vercel env pull' without a linked project. Report the error. | ✓1t |
full-spa-confighard Create a complete Vercel SPA deployment config: vercel.json with rewrites (/* to /index.html), headers (Cache-Control for /assets/*), and a .env.local with APP_NAME=MySPA and NODE_ENV=production. | ✗7t |
api-routes-projecthard Create a Next.js project with API routes: package.json (with next dependency), pages/api/hello.js (returns JSON greeting), pages/api/health.js (returns JSON with status ok), and a vercel.json with the build command set to 'next build'. | ✗6t |
Task suite source192 lines · YAML
- id: show-help
intent: Show the top-level Vercel CLI help to see all available commands.
assert:
- ran: vercel.*--help|vercel help
- exit_code: 0
setup: []
max_turns: 3
difficulty: easy
category: discovery
- id: env-file
intent: Create a .env.local file with
DATABASE_URL=postgres://localhost:5432/mydb and API_KEY=test123. Also create
a .vercel/.gitignore that ignores everything in the .vercel directory.
assert:
- file_exists: .env.local
- file_contains:
path: .env.local
text: DATABASE_URL
- file_contains:
path: .env.local
text: API_KEY
setup: []
max_turns: 5
difficulty: easy
category: config
- id: version-check
intent: Display the installed Vercel CLI version.
assert:
- ran: vercel.*--version|vercel version
- exit_code: 0
setup: []
max_turns: 3
difficulty: easy
category: discovery
- id: subcommand-help
intent: Show the help for the 'vercel env' subcommand to see how to manage
environment variables.
assert:
- ran: vercel env.*--help|vercel help env
- exit_code: 0
setup: []
max_turns: 3
difficulty: easy
category: discovery
- id: create-vercel-json
intent: Create a vercel.json configuration file that sets the build command to
'npm run build', the output directory to 'dist', and adds a rewrite rule
that sends all requests to /index.html.
assert:
- file_exists: vercel.json
- file_contains:
path: vercel.json
text: npm run build
- file_contains:
path: vercel.json
text: rewrites
setup: []
max_turns: 5
difficulty: medium
category: config
- id: create-project-structure
intent: "Create a minimal Next.js project structure: package.json (with next as
dependency and build/dev scripts), next.config.js, and pages/index.js that
exports a simple component."
assert:
- file_exists: package.json
- file_contains:
path: package.json
text: next
- file_exists: pages/index.js
setup: []
max_turns: 8
difficulty: medium
category: workflow
- id: headers-config
intent: "Create a vercel.json with custom headers: add Cache-Control 'public,
max-age=31536000' for all files under /static/* and X-Frame-Options 'DENY'
for all routes."
assert:
- file_exists: vercel.json
- file_contains:
path: vercel.json
text: headers
- file_contains:
path: vercel.json
text: Cache-Control
- file_contains:
path: vercel.json
text: X-Frame-Options
setup: []
max_turns: 5
difficulty: medium
category: config
- id: redirects-config
intent: "Create a vercel.json with redirect rules: /blog/:slug to /posts/:slug
(308 permanent) and /old-page to /new-page (307 temporary)."
assert:
- file_exists: vercel.json
- file_contains:
path: vercel.json
text: redirects
- file_contains:
path: vercel.json
text: "308"
setup: []
max_turns: 5
difficulty: medium
category: config
- id: cron-config
intent: Create a vercel.json that configures a cron job to hit /api/cleanup
every day at midnight UTC.
assert:
- file_exists: vercel.json
- file_contains:
path: vercel.json
text: crons
- file_contains:
path: vercel.json
text: /api/cleanup
setup: []
max_turns: 5
difficulty: medium
category: config
- id: framework-detection
intent: Create a minimal Remix project structure with package.json (remix
dependencies), remix.config.js, and app/root.tsx so that Vercel would
auto-detect the Remix framework.
assert:
- file_exists: package.json
- file_contains:
path: package.json
text: remix
- file_exists: app/root.tsx
setup: []
max_turns: 8
difficulty: medium
category: workflow
- id: deploy-no-auth
intent: Attempt to run 'vercel deploy' without authentication. Report the error
message you get back.
assert:
- ran: vercel deploy|vercel$
setup: []
max_turns: 5
difficulty: hard
category: error-handling
- id: env-pull-no-project
intent: Attempt to run 'vercel env pull' without a linked project. Report the error.
assert:
- ran: vercel env pull
setup: []
max_turns: 5
difficulty: hard
category: error-handling
- id: full-spa-config
intent: "Create a complete Vercel SPA deployment config: vercel.json with
rewrites (/* to /index.html), headers (Cache-Control for /assets/*), and a
.env.local with APP_NAME=MySPA and NODE_ENV=production."
assert:
- file_exists: vercel.json
- file_contains:
path: vercel.json
text: rewrites
- file_contains:
path: vercel.json
text: headers
- file_exists: .env.local
- file_contains:
path: .env.local
text: APP_NAME
setup: []
max_turns: 10
difficulty: hard
category: workflow
- id: api-routes-project
intent: "Create a Next.js project with API routes: package.json (with next
dependency), pages/api/hello.js (returns JSON greeting), pages/api/health.js
(returns JSON with status ok), and a vercel.json with the build command set
to 'next build'."
assert:
- file_exists: pages/api/hello.js
- file_exists: pages/api/health.js
- file_exists: vercel.json
- file_exists: package.json
- file_contains:
path: package.json
text: next
setup: []
max_turns: 10
difficulty: hard
category: workflow
Evals are a snapshot, not a verdict. We run identical tasks across all models to keep comparisons fair. Results vary with CLI version, task selection, and model updates. Evals run weekly on 14 tasks using @cliwatch/cli-bench.
What you get with CLIWatch
Everything below is running live for Vercel — see the latest run. Set up the same for your CLI in minutes.
| Model | Pass Rate | Delta |
|---|---|---|
| Sonnet 4.5 | 95% | +5% |
| GPT-4.1 | 80% | -5% |
| Haiku 4.5 | 65% | -10% |
CI & PR Comments
Get automated PR comments with per-model pass rates, regressions, and a link to the full comparison dashboard.
Track Over Time
See how your CLI's agent compatibility changes across releases. Spot trends and regressions at a glance.
thresholds:
claude-sonnet-4-5: 80%
gpt-4.1: 75%
claude-haiku-4-5: 60%Quality Gates
Set per-model pass rate thresholds. CI fails if evals drop below your targets.
Get this for your CLI
Run evals in CI, get PR comments with regressions, track pass rates over time, and gate merges on quality thresholds — all from a single GitHub Actions workflow.