CLI Directory/GNU Make
# agent runs build targets
$ make build
  gcc -Wall -O2 -o app main.c
 
$ make test
  Running tests...
  All 12 tests passed.

Can AI agents use GNU Make?

The classic build automation tool. Agents use it to run build targets, manage dependencies between tasks, and automate project workflows.

See the latest run →
100% overall pass rate1 model tested4 tasksvGNU Make 4.33/6/2026

GNU Make eval results by model

ModelPass rateAvg turnsAvg tokens
gpt-5-nano100%2.814.5k

GNU Make task results by model

Taskgpt-5-nano
basic-makefileeasy
Create a Makefile with a 'hello' target that prints 'Hello, Make!' to stdout. Run the target.
2t
multi-targetmedium
Create a Makefile with three targets: 'build' that creates a file called build.txt containing 'built', 'clean' that removes build.txt, and 'all' that depends on build. Make 'all' the default target. Run 'make' and verify build.txt exists.
3t
variables-and-patternseasy
Create a Makefile that defines a variable CC=gcc and a variable CFLAGS=-Wall -O2. Add a target 'info' that prints both variables. Run 'make info'.
4t
phony-and-helpmedium
Create a Makefile with .PHONY declarations for 'test', 'lint', and 'help' targets. The 'help' target should parse the Makefile and print each target with a description from ## comments. The 'test' target should echo 'running tests'. The 'lint' target should echo 'running linter'. Run 'make help'.
2t
Task suite source61 lines · YAML
- id: basic-makefile
  intent: Create a Makefile with a 'hello' target that prints 'Hello, Make!' to
    stdout. Run the target.
  assert:
    - file_exists: Makefile
    - ran: make.*hello|make$
    - output_contains: Hello, Make!
  setup: []
  max_turns: 3
  difficulty: easy
  category: basics
- id: multi-target
  intent: "Create a Makefile with three targets: 'build' that creates a file
    called build.txt containing 'built', 'clean' that removes build.txt, and
    'all' that depends on build. Make 'all' the default target. Run 'make' and
    verify build.txt exists."
  assert:
    - file_exists: Makefile
    - ran: make
    - file_exists: build.txt
    - file_contains:
        path: build.txt
        text: built
  setup: []
  max_turns: 5
  difficulty: medium
  category: targets
- id: variables-and-patterns
  intent: Create a Makefile that defines a variable CC=gcc and a variable
    CFLAGS=-Wall -O2. Add a target 'info' that prints both variables. Run 'make
    info'.
  assert:
    - file_exists: Makefile
    - file_contains:
        path: Makefile
        text: CC
    - ran: make info
    - output_contains: gcc
  setup: []
  max_turns: 4
  difficulty: easy
  category: variables
- id: phony-and-help
  intent: "Create a Makefile with .PHONY declarations for 'test', 'lint', and
    'help' targets. The 'help' target should parse the Makefile and print each
    target with a description from ## comments. The 'test' target should echo
    'running tests'. The 'lint' target should echo 'running linter'. Run 'make
    help'."
  assert:
    - file_exists: Makefile
    - file_contains:
        path: Makefile
        text: .PHONY
    - ran: make help
    - output_contains: test
    - output_contains: lint
  setup: []
  max_turns: 6
  difficulty: medium
  category: workflow

Evals are a snapshot, not a verdict. We run identical tasks across all models to keep comparisons fair. Results vary with CLI version, task selection, and model updates. Evals run weekly on 4 tasks using @cliwatch/cli-bench.

What you get with CLIWatch

Everything below is running live for GNU Make see the latest run. Set up the same for your CLI in minutes.

ModelPass RateDelta
Sonnet 4.595%+5%
GPT-4.180%-5%
Haiku 4.565%-10%

CI & PR Comments

Get automated PR comments with per-model pass rates, regressions, and a link to the full comparison dashboard.

Pass rateLast 30 days
v1.0v1.6

Track Over Time

See how your CLI's agent compatibility changes across releases. Spot trends and regressions at a glance.

thresholds:
  claude-sonnet-4-5: 80%
  gpt-4.1: 75%
  claude-haiku-4-5: 60%

Quality Gates

Set per-model pass rate thresholds. CI fails if evals drop below your targets.

Get this for your CLI

Run evals in CI, get PR comments with regressions, track pass rates over time, and gate merges on quality thresholds — all from a single GitHub Actions workflow.

Compare other CLI evals