# agent manages a monorepo $ pnpm add lodash + lodash 4.17.21 $ pnpm ls --depth=0 dependencies: lodash 4.17.21
Can AI agents use pnpm?
Fast, disk-efficient JavaScript package manager. Agents use it to install dependencies, run scripts, and manage monorepo workspaces.
See the latest run →pnpm eval results by model
| Model | Pass rate | Avg turns | Avg tokens |
|---|---|---|---|
| gpt-5-nano | 63% | 3.7 | 8.1k |
pnpm task results by model
| Task | gpt-5-nano |
|---|---|
quickstart-init-projecteasy Initialize a new Node.js project using pnpm init. The resulting package.json should exist in the current directory. | ✓3t |
quickstart-add-dependencyeasy Initialize a project with pnpm init, then add 'is-odd' as a dependency using pnpm add. | ✗4t |
quickstart-run-scripteasy Create a package.json that has a script named 'greet' which runs 'echo Hello from pnpm'. Then run that script with pnpm. | ✓2t |
error-audit-checkhard Initialize a project, add 'is-odd' as a dependency, then run pnpm audit to check for vulnerabilities. Save the audit output to bench-audit.txt. | ✗6t |
discover-versioneasy Check what version of pnpm is installed and print it. | ✓1t |
discover-helpeasy Show the pnpm help output to see all available commands. | ✓2t |
config-dev-dependencymedium Initialize a project and add 'typescript' as a dev dependency (not a production dependency). Verify it appears under devDependencies in package.json. | ✓5t |
config-exact-versionmedium Initialize a project and add 'is-odd' with an exact version (no ^ or ~ prefix in package.json). Use the pnpm flag that saves the exact version. | ✗5t |
config-workspace-setupmedium Set up a pnpm workspace with two packages: packages/core and packages/utils. Each should have its own package.json (use pnpm init in each). Create a pnpm-workspace.yaml at the root that includes 'packages/*'. | ✗6t |
flags-list-dependenciesmedium Initialize a project, add 'is-odd' and 'is-even' as dependencies, then list all installed dependencies at depth 0. | ✗6t |
flags-global-bin-pathmedium Print the directory where pnpm installs global binaries using the pnpm bin command with the global flag. | ✓5t |
flags-dlx-run-packagemedium Use pnpm dlx to run the 'cowsay' package (without installing it permanently) with the argument 'Hello from bench'. The output should contain the greeting. | ✓1t |
flags-exec-in-projectmedium Initialize a project with pnpm init, then use 'pnpm exec' to run 'node -e "console.log(process.cwd())"' which prints the current working directory. | ✓5t |
error-remove-missing-dephard Initialize a project with pnpm init. Then try to remove a package called 'nonexistent-pkg-xyz' that was never installed. Capture pnpm's output. Then add 'is-odd' and successfully remove it, verifying it no longer appears in package.json. | ✓6t |
error-frozen-lockfilehard Initialize a project and add 'is-odd'. Then manually edit package.json to add 'is-even' to dependencies (without running pnpm install). Run 'pnpm install --frozen-lockfile' which should fail because the lockfile is out of date. Write 'FROZEN_FAILED' to bench-result.txt if it fails. | ✗8t |
workflow-alias-installhard Initialize a project. Install 'is-odd' under the alias 'my-odd' using pnpm's alias syntax (pnpm add my-odd@npm:is-odd). Verify that package.json contains 'my-odd' as a dependency name. | ✓4t |
workflow-script-chainhard Create a package.json with three scripts: 'clean' that runs 'rm -rf bench-dist', 'build' that runs 'mkdir -p bench-dist && echo built > bench-dist/output.txt', and 'all' that runs 'pnpm run clean && pnpm run build'. Run the 'all' script with pnpm. Verify bench-dist/output.txt exists with the word 'built'. | ✓4t |
workflow-workspace-cross-refhard Set up a pnpm workspace with two packages: packages/utils (name '@bench/utils', version '1.0.0') and packages/app (name '@bench/app'). Make @bench/app depend on @bench/utils using 'workspace:*' protocol in its package.json dependencies. Then run pnpm install at the workspace root. Verify that packages/app/package.json references @bench/utils. | ✓6t |
workflow-create-and-confighard Initialize a project with pnpm init. Set the pnpm config key 'virtual-store-dir' to '.pnpm-store' using 'pnpm config set'. Then read it back with 'pnpm config get virtual-store-dir' and write the value to bench-config.txt. | ✗8t |
Task suite source272 lines · YAML
- id: quickstart-init-project
intent: Initialize a new Node.js project using pnpm init. The resulting
package.json should exist in the current directory.
assert:
- ran: pnpm init
- file_exists: package.json
setup: []
max_turns: 3
difficulty: easy
category: getting-started
docs_origin: docs/cli/create.md#pnpm create
- id: quickstart-add-dependency
intent: Initialize a project with pnpm init, then add 'is-odd' as a dependency
using pnpm add.
assert:
- ran: pnpm
- ran: pnpm add is-odd
- file_exists: package.json
- file_contains:
path: package.json
text: is-odd
setup: []
max_turns: 4
difficulty: easy
category: getting-started
docs_origin: docs/cli/add.md#pnpm add
- id: quickstart-run-script
intent: Create a package.json that has a script named 'greet' which runs 'echo
Hello from pnpm'. Then run that script with pnpm.
assert:
- ran: pnpm
- file_exists: package.json
- file_contains:
path: package.json
text: greet
- output_contains: Hello from pnpm
setup: []
max_turns: 4
difficulty: easy
category: getting-started
docs_origin: docs/cli/run.md#pnpm run
- id: discover-version
intent: Check what version of pnpm is installed and print it.
assert:
- ran: pnpm.*--version|-v
- output_contains: .
setup: []
max_turns: 3
difficulty: easy
category: command-discovery
docs_origin: docs/cli/config.md#pnpm config
- id: discover-help
intent: Show the pnpm help output to see all available commands.
assert:
- ran: pnpm.*--help|pnpm help|pnpm -h
setup: []
max_turns: 3
difficulty: easy
category: command-discovery
docs_origin: docs/cli/exec.md#pnpm exec
- id: config-dev-dependency
intent: Initialize a project and add 'typescript' as a dev dependency (not a
production dependency). Verify it appears under devDependencies in
package.json.
assert:
- ran: pnpm
- ran: pnpm add.*-D|pnpm add.*--save-dev
- file_exists: package.json
- file_contains:
path: package.json
text: devDependencies
- file_contains:
path: package.json
text: typescript
setup: []
max_turns: 5
difficulty: medium
category: config
docs_origin: docs/cli/add.md#Options
- id: config-exact-version
intent: Initialize a project and add 'is-odd' with an exact version (no ^ or ~
prefix in package.json). Use the pnpm flag that saves the exact version.
assert:
- ran: pnpm
- ran: pnpm add.*-E|pnpm add.*--save-exact
- file_exists: package.json
- file_contains:
path: package.json
text: is-odd
setup: []
max_turns: 5
difficulty: medium
category: config
docs_origin: docs/cli/add.md#Options
- id: config-workspace-setup
intent: "Set up a pnpm workspace with two packages: packages/core and
packages/utils. Each should have its own package.json (use pnpm init in
each). Create a pnpm-workspace.yaml at the root that includes 'packages/*'."
assert:
- ran: pnpm
- file_exists: pnpm-workspace.yaml
- file_contains:
path: pnpm-workspace.yaml
text: packages
- file_exists: packages/core/package.json
- file_exists: packages/utils/package.json
setup: []
max_turns: 6
difficulty: medium
category: config
docs_origin: docs/catalogs.md#Catalogs
- id: flags-list-dependencies
intent: Initialize a project, add 'is-odd' and 'is-even' as dependencies, then
list all installed dependencies at depth 0.
assert:
- ran: pnpm
- ran: pnpm ls|pnpm list
- output_contains: is-odd
- output_contains: is-even
setup: []
max_turns: 6
difficulty: medium
category: flag-parsing
docs_origin: docs/cli/add.md#pnpm add
- id: flags-global-bin-path
intent: Print the directory where pnpm installs global binaries using the pnpm
bin command with the global flag.
assert:
- ran: pnpm bin.*-g|pnpm bin.*--global
setup: []
max_turns: 5
difficulty: medium
category: flag-parsing
docs_origin: docs/cli/bin.md#pnpm bin
- id: flags-dlx-run-package
intent: Use pnpm dlx to run the 'cowsay' package (without installing it
permanently) with the argument 'Hello from bench'. The output should contain
the greeting.
assert:
- ran: pnpm dlx.*cowsay
- output_contains: Hello from bench
setup: []
max_turns: 5
difficulty: medium
category: flag-parsing
docs_origin: docs/cli/dlx.md#pnpm dlx
- id: flags-exec-in-project
intent: Initialize a project with pnpm init, then use 'pnpm exec' to run 'node
-e "console.log(process.cwd())"' which prints the current working directory.
assert:
- ran: pnpm
- ran: pnpm exec
setup: []
max_turns: 5
difficulty: medium
category: flag-parsing
docs_origin: docs/cli/exec.md#pnpm exec
- id: error-remove-missing-dep
intent: Initialize a project with pnpm init. Then try to remove a package called
'nonexistent-pkg-xyz' that was never installed. Capture pnpm's output. Then
add 'is-odd' and successfully remove it, verifying it no longer appears in
package.json.
assert:
- ran: pnpm
- ran: pnpm (remove|rm|uninstall).*is-odd
- file_exists: package.json
setup: []
max_turns: 6
difficulty: hard
category: error-recovery
docs_origin: docs/cli/add.md#pnpm add
- id: error-frozen-lockfile
intent: Initialize a project and add 'is-odd'. Then manually edit package.json
to add 'is-even' to dependencies (without running pnpm install). Run 'pnpm
install --frozen-lockfile' which should fail because the lockfile is out of
date. Write 'FROZEN_FAILED' to bench-result.txt if it fails.
assert:
- ran: pnpm
- ran: pnpm install.*--frozen-lockfile
- file_exists: bench-result.txt
- file_contains:
path: bench-result.txt
text: FROZEN_FAILED
setup: []
max_turns: 8
difficulty: hard
category: error-recovery
docs_origin: docs/cli/install.md#Options
- id: error-audit-check
intent: Initialize a project, add 'is-odd' as a dependency, then run pnpm audit
to check for vulnerabilities. Save the audit output to bench-audit.txt.
assert:
- ran: pnpm
- ran: pnpm audit
- file_exists: bench-audit.txt
setup: []
max_turns: 6
difficulty: hard
category: error-recovery
docs_origin: docs/cli/audit.md#pnpm audit
- id: workflow-alias-install
intent: Initialize a project. Install 'is-odd' under the alias 'my-odd' using
pnpm's alias syntax (pnpm add my-odd@npm:is-odd). Verify that package.json
contains 'my-odd' as a dependency name.
assert:
- ran: pnpm
- ran: pnpm add my-odd@npm:is-odd
- file_exists: package.json
- file_contains:
path: package.json
text: my-odd
setup: []
max_turns: 8
difficulty: hard
category: multi-step-workflow
docs_origin: docs/aliases.md#Aliases
- id: workflow-script-chain
intent: "Create a package.json with three scripts: 'clean' that runs 'rm -rf
bench-dist', 'build' that runs 'mkdir -p bench-dist && echo built >
bench-dist/output.txt', and 'all' that runs 'pnpm run clean && pnpm run
build'. Run the 'all' script with pnpm. Verify bench-dist/output.txt exists
with the word 'built'."
assert:
- ran: pnpm
- ran: pnpm.*all|pnpm run all
- file_exists: bench-dist/output.txt
- file_contains:
path: bench-dist/output.txt
text: built
setup: []
max_turns: 10
difficulty: hard
category: multi-step-workflow
docs_origin: docs/cli/run.md#pnpm run
- id: workflow-workspace-cross-ref
intent: "Set up a pnpm workspace with two packages: packages/utils (name
'@bench/utils', version '1.0.0') and packages/app (name '@bench/app'). Make
@bench/app depend on @bench/utils using 'workspace:*' protocol in its
package.json dependencies. Then run pnpm install at the workspace root.
Verify that packages/app/package.json references @bench/utils."
assert:
- ran: pnpm
- file_exists: pnpm-workspace.yaml
- file_exists: packages/utils/package.json
- file_exists: packages/app/package.json
- file_contains:
path: packages/app/package.json
text: "@bench/utils"
setup: []
max_turns: 10
difficulty: hard
category: multi-step-workflow
docs_origin: docs/catalogs.md#Catalogs
- id: workflow-create-and-config
intent: Initialize a project with pnpm init. Set the pnpm config key
'virtual-store-dir' to '.pnpm-store' using 'pnpm config set'. Then read it
back with 'pnpm config get virtual-store-dir' and write the value to
bench-config.txt.
assert:
- ran: pnpm
- ran: pnpm config set
- ran: pnpm config get
- file_exists: bench-config.txt
- file_contains:
path: bench-config.txt
text: .pnpm-store
setup: []
max_turns: 8
difficulty: hard
category: multi-step-workflow
docs_origin: docs/cli/config.md#pnpm config
Evals are a snapshot, not a verdict. We run identical tasks across all models to keep comparisons fair. Results vary with CLI version, task selection, and model updates. Evals run weekly on 19 tasks using @cliwatch/cli-bench.
What you get with CLIWatch
Everything below is running live for pnpm — see the latest run. Set up the same for your CLI in minutes.
| Model | Pass Rate | Delta |
|---|---|---|
| Sonnet 4.5 | 95% | +5% |
| GPT-4.1 | 80% | -5% |
| Haiku 4.5 | 65% | -10% |
CI & PR Comments
Get automated PR comments with per-model pass rates, regressions, and a link to the full comparison dashboard.
Track Over Time
See how your CLI's agent compatibility changes across releases. Spot trends and regressions at a glance.
thresholds:
claude-sonnet-4-5: 80%
gpt-4.1: 75%
claude-haiku-4-5: 60%Quality Gates
Set per-model pass rate thresholds. CI fails if evals drop below your targets.
Get this for your CLI
Run evals in CI, get PR comments with regressions, track pass rates over time, and gate merges on quality thresholds — all from a single GitHub Actions workflow.