# agent manages a monorepo
$ pnpm add lodash
  + lodash 4.17.21
 
$ pnpm ls --depth=0
  dependencies:
    lodash 4.17.21

Can AI agents use pnpm?

Fast, disk-efficient JavaScript package manager. Agents use it to install dependencies, run scripts, and manage monorepo workspaces.

See the latest run →
63% overall pass rate1 model tested19 tasksv10.30.33/6/2026

pnpm eval results by model

ModelPass rateAvg turnsAvg tokens
gpt-5-nano63%3.78.1k

pnpm task results by model

Taskgpt-5-nano
quickstart-init-projecteasy
Initialize a new Node.js project using pnpm init. The resulting package.json should exist in the current directory.
3t
quickstart-add-dependencyeasy
Initialize a project with pnpm init, then add 'is-odd' as a dependency using pnpm add.
4t
quickstart-run-scripteasy
Create a package.json that has a script named 'greet' which runs 'echo Hello from pnpm'. Then run that script with pnpm.
2t
error-audit-checkhard
Initialize a project, add 'is-odd' as a dependency, then run pnpm audit to check for vulnerabilities. Save the audit output to bench-audit.txt.
6t
discover-versioneasy
Check what version of pnpm is installed and print it.
1t
discover-helpeasy
Show the pnpm help output to see all available commands.
2t
config-dev-dependencymedium
Initialize a project and add 'typescript' as a dev dependency (not a production dependency). Verify it appears under devDependencies in package.json.
5t
config-exact-versionmedium
Initialize a project and add 'is-odd' with an exact version (no ^ or ~ prefix in package.json). Use the pnpm flag that saves the exact version.
5t
config-workspace-setupmedium
Set up a pnpm workspace with two packages: packages/core and packages/utils. Each should have its own package.json (use pnpm init in each). Create a pnpm-workspace.yaml at the root that includes 'packages/*'.
6t
flags-list-dependenciesmedium
Initialize a project, add 'is-odd' and 'is-even' as dependencies, then list all installed dependencies at depth 0.
6t
flags-global-bin-pathmedium
Print the directory where pnpm installs global binaries using the pnpm bin command with the global flag.
5t
flags-dlx-run-packagemedium
Use pnpm dlx to run the 'cowsay' package (without installing it permanently) with the argument 'Hello from bench'. The output should contain the greeting.
1t
flags-exec-in-projectmedium
Initialize a project with pnpm init, then use 'pnpm exec' to run 'node -e "console.log(process.cwd())"' which prints the current working directory.
5t
error-remove-missing-dephard
Initialize a project with pnpm init. Then try to remove a package called 'nonexistent-pkg-xyz' that was never installed. Capture pnpm's output. Then add 'is-odd' and successfully remove it, verifying it no longer appears in package.json.
6t
error-frozen-lockfilehard
Initialize a project and add 'is-odd'. Then manually edit package.json to add 'is-even' to dependencies (without running pnpm install). Run 'pnpm install --frozen-lockfile' which should fail because the lockfile is out of date. Write 'FROZEN_FAILED' to bench-result.txt if it fails.
8t
workflow-alias-installhard
Initialize a project. Install 'is-odd' under the alias 'my-odd' using pnpm's alias syntax (pnpm add my-odd@npm:is-odd). Verify that package.json contains 'my-odd' as a dependency name.
4t
workflow-script-chainhard
Create a package.json with three scripts: 'clean' that runs 'rm -rf bench-dist', 'build' that runs 'mkdir -p bench-dist && echo built > bench-dist/output.txt', and 'all' that runs 'pnpm run clean && pnpm run build'. Run the 'all' script with pnpm. Verify bench-dist/output.txt exists with the word 'built'.
4t
workflow-workspace-cross-refhard
Set up a pnpm workspace with two packages: packages/utils (name '@bench/utils', version '1.0.0') and packages/app (name '@bench/app'). Make @bench/app depend on @bench/utils using 'workspace:*' protocol in its package.json dependencies. Then run pnpm install at the workspace root. Verify that packages/app/package.json references @bench/utils.
6t
workflow-create-and-confighard
Initialize a project with pnpm init. Set the pnpm config key 'virtual-store-dir' to '.pnpm-store' using 'pnpm config set'. Then read it back with 'pnpm config get virtual-store-dir' and write the value to bench-config.txt.
8t
Task suite source272 lines · YAML
- id: quickstart-init-project
  intent: Initialize a new Node.js project using pnpm init. The resulting
    package.json should exist in the current directory.
  assert:
    - ran: pnpm init
    - file_exists: package.json
  setup: []
  max_turns: 3
  difficulty: easy
  category: getting-started
  docs_origin: docs/cli/create.md#pnpm create
- id: quickstart-add-dependency
  intent: Initialize a project with pnpm init, then add 'is-odd' as a dependency
    using pnpm add.
  assert:
    - ran: pnpm
    - ran: pnpm add is-odd
    - file_exists: package.json
    - file_contains:
        path: package.json
        text: is-odd
  setup: []
  max_turns: 4
  difficulty: easy
  category: getting-started
  docs_origin: docs/cli/add.md#pnpm add
- id: quickstart-run-script
  intent: Create a package.json that has a script named 'greet' which runs 'echo
    Hello from pnpm'. Then run that script with pnpm.
  assert:
    - ran: pnpm
    - file_exists: package.json
    - file_contains:
        path: package.json
        text: greet
    - output_contains: Hello from pnpm
  setup: []
  max_turns: 4
  difficulty: easy
  category: getting-started
  docs_origin: docs/cli/run.md#pnpm run
- id: discover-version
  intent: Check what version of pnpm is installed and print it.
  assert:
    - ran: pnpm.*--version|-v
    - output_contains: .
  setup: []
  max_turns: 3
  difficulty: easy
  category: command-discovery
  docs_origin: docs/cli/config.md#pnpm config
- id: discover-help
  intent: Show the pnpm help output to see all available commands.
  assert:
    - ran: pnpm.*--help|pnpm help|pnpm -h
  setup: []
  max_turns: 3
  difficulty: easy
  category: command-discovery
  docs_origin: docs/cli/exec.md#pnpm exec
- id: config-dev-dependency
  intent: Initialize a project and add 'typescript' as a dev dependency (not a
    production dependency). Verify it appears under devDependencies in
    package.json.
  assert:
    - ran: pnpm
    - ran: pnpm add.*-D|pnpm add.*--save-dev
    - file_exists: package.json
    - file_contains:
        path: package.json
        text: devDependencies
    - file_contains:
        path: package.json
        text: typescript
  setup: []
  max_turns: 5
  difficulty: medium
  category: config
  docs_origin: docs/cli/add.md#Options
- id: config-exact-version
  intent: Initialize a project and add 'is-odd' with an exact version (no ^ or ~
    prefix in package.json). Use the pnpm flag that saves the exact version.
  assert:
    - ran: pnpm
    - ran: pnpm add.*-E|pnpm add.*--save-exact
    - file_exists: package.json
    - file_contains:
        path: package.json
        text: is-odd
  setup: []
  max_turns: 5
  difficulty: medium
  category: config
  docs_origin: docs/cli/add.md#Options
- id: config-workspace-setup
  intent: "Set up a pnpm workspace with two packages: packages/core and
    packages/utils. Each should have its own package.json (use pnpm init in
    each). Create a pnpm-workspace.yaml at the root that includes 'packages/*'."
  assert:
    - ran: pnpm
    - file_exists: pnpm-workspace.yaml
    - file_contains:
        path: pnpm-workspace.yaml
        text: packages
    - file_exists: packages/core/package.json
    - file_exists: packages/utils/package.json
  setup: []
  max_turns: 6
  difficulty: medium
  category: config
  docs_origin: docs/catalogs.md#Catalogs
- id: flags-list-dependencies
  intent: Initialize a project, add 'is-odd' and 'is-even' as dependencies, then
    list all installed dependencies at depth 0.
  assert:
    - ran: pnpm
    - ran: pnpm ls|pnpm list
    - output_contains: is-odd
    - output_contains: is-even
  setup: []
  max_turns: 6
  difficulty: medium
  category: flag-parsing
  docs_origin: docs/cli/add.md#pnpm add
- id: flags-global-bin-path
  intent: Print the directory where pnpm installs global binaries using the pnpm
    bin command with the global flag.
  assert:
    - ran: pnpm bin.*-g|pnpm bin.*--global
  setup: []
  max_turns: 5
  difficulty: medium
  category: flag-parsing
  docs_origin: docs/cli/bin.md#pnpm bin
- id: flags-dlx-run-package
  intent: Use pnpm dlx to run the 'cowsay' package (without installing it
    permanently) with the argument 'Hello from bench'. The output should contain
    the greeting.
  assert:
    - ran: pnpm dlx.*cowsay
    - output_contains: Hello from bench
  setup: []
  max_turns: 5
  difficulty: medium
  category: flag-parsing
  docs_origin: docs/cli/dlx.md#pnpm dlx
- id: flags-exec-in-project
  intent: Initialize a project with pnpm init, then use 'pnpm exec' to run 'node
    -e "console.log(process.cwd())"' which prints the current working directory.
  assert:
    - ran: pnpm
    - ran: pnpm exec
  setup: []
  max_turns: 5
  difficulty: medium
  category: flag-parsing
  docs_origin: docs/cli/exec.md#pnpm exec
- id: error-remove-missing-dep
  intent: Initialize a project with pnpm init. Then try to remove a package called
    'nonexistent-pkg-xyz' that was never installed. Capture pnpm's output. Then
    add 'is-odd' and successfully remove it, verifying it no longer appears in
    package.json.
  assert:
    - ran: pnpm
    - ran: pnpm (remove|rm|uninstall).*is-odd
    - file_exists: package.json
  setup: []
  max_turns: 6
  difficulty: hard
  category: error-recovery
  docs_origin: docs/cli/add.md#pnpm add
- id: error-frozen-lockfile
  intent: Initialize a project and add 'is-odd'. Then manually edit package.json
    to add 'is-even' to dependencies (without running pnpm install). Run 'pnpm
    install --frozen-lockfile' which should fail because the lockfile is out of
    date. Write 'FROZEN_FAILED' to bench-result.txt if it fails.
  assert:
    - ran: pnpm
    - ran: pnpm install.*--frozen-lockfile
    - file_exists: bench-result.txt
    - file_contains:
        path: bench-result.txt
        text: FROZEN_FAILED
  setup: []
  max_turns: 8
  difficulty: hard
  category: error-recovery
  docs_origin: docs/cli/install.md#Options
- id: error-audit-check
  intent: Initialize a project, add 'is-odd' as a dependency, then run pnpm audit
    to check for vulnerabilities. Save the audit output to bench-audit.txt.
  assert:
    - ran: pnpm
    - ran: pnpm audit
    - file_exists: bench-audit.txt
  setup: []
  max_turns: 6
  difficulty: hard
  category: error-recovery
  docs_origin: docs/cli/audit.md#pnpm audit
- id: workflow-alias-install
  intent: Initialize a project. Install 'is-odd' under the alias 'my-odd' using
    pnpm's alias syntax (pnpm add my-odd@npm:is-odd). Verify that package.json
    contains 'my-odd' as a dependency name.
  assert:
    - ran: pnpm
    - ran: pnpm add my-odd@npm:is-odd
    - file_exists: package.json
    - file_contains:
        path: package.json
        text: my-odd
  setup: []
  max_turns: 8
  difficulty: hard
  category: multi-step-workflow
  docs_origin: docs/aliases.md#Aliases
- id: workflow-script-chain
  intent: "Create a package.json with three scripts: 'clean' that runs 'rm -rf
    bench-dist', 'build' that runs 'mkdir -p bench-dist && echo built >
    bench-dist/output.txt', and 'all' that runs 'pnpm run clean && pnpm run
    build'. Run the 'all' script with pnpm. Verify bench-dist/output.txt exists
    with the word 'built'."
  assert:
    - ran: pnpm
    - ran: pnpm.*all|pnpm run all
    - file_exists: bench-dist/output.txt
    - file_contains:
        path: bench-dist/output.txt
        text: built
  setup: []
  max_turns: 10
  difficulty: hard
  category: multi-step-workflow
  docs_origin: docs/cli/run.md#pnpm run
- id: workflow-workspace-cross-ref
  intent: "Set up a pnpm workspace with two packages: packages/utils (name
    '@bench/utils', version '1.0.0') and packages/app (name '@bench/app'). Make
    @bench/app depend on @bench/utils using 'workspace:*' protocol in its
    package.json dependencies. Then run pnpm install at the workspace root.
    Verify that packages/app/package.json references @bench/utils."
  assert:
    - ran: pnpm
    - file_exists: pnpm-workspace.yaml
    - file_exists: packages/utils/package.json
    - file_exists: packages/app/package.json
    - file_contains:
        path: packages/app/package.json
        text: "@bench/utils"
  setup: []
  max_turns: 10
  difficulty: hard
  category: multi-step-workflow
  docs_origin: docs/catalogs.md#Catalogs
- id: workflow-create-and-config
  intent: Initialize a project with pnpm init. Set the pnpm config key
    'virtual-store-dir' to '.pnpm-store' using 'pnpm config set'. Then read it
    back with 'pnpm config get virtual-store-dir' and write the value to
    bench-config.txt.
  assert:
    - ran: pnpm
    - ran: pnpm config set
    - ran: pnpm config get
    - file_exists: bench-config.txt
    - file_contains:
        path: bench-config.txt
        text: .pnpm-store
  setup: []
  max_turns: 8
  difficulty: hard
  category: multi-step-workflow
  docs_origin: docs/cli/config.md#pnpm config

Evals are a snapshot, not a verdict. We run identical tasks across all models to keep comparisons fair. Results vary with CLI version, task selection, and model updates. Evals run weekly on 19 tasks using @cliwatch/cli-bench.

What you get with CLIWatch

Everything below is running live for pnpm see the latest run. Set up the same for your CLI in minutes.

ModelPass RateDelta
Sonnet 4.595%+5%
GPT-4.180%-5%
Haiku 4.565%-10%

CI & PR Comments

Get automated PR comments with per-model pass rates, regressions, and a link to the full comparison dashboard.

Pass rateLast 30 days
v1.0v1.6

Track Over Time

See how your CLI's agent compatibility changes across releases. Spot trends and regressions at a glance.

thresholds:
  claude-sonnet-4-5: 80%
  gpt-4.1: 75%
  claude-haiku-4-5: 60%

Quality Gates

Set per-model pass rate thresholds. CI fails if evals drop below your targets.

Get this for your CLI

Run evals in CI, get PR comments with regressions, track pass rates over time, and gate merges on quality thresholds — all from a single GitHub Actions workflow.

Compare other CLI evals