# agent checks pod health
$ kubectl get pods -n production -o json
  {"items": [{"metadata": {"name": "api-7d4.."}
   "status": {"phase": "Running"}}]}
 
$ kubectl logs api-7d4.. -n production --tail=20
  2026-02-08 INFO  Server started on :8080

Can AI agents use kubectl?

The Kubernetes command-line tool. Used by agents to manage clusters, inspect workloads, debug pods, and apply manifests.

See the latest run →
50% overall pass rate1 model tested4 tasksv1.35.23/6/2026

kubectl eval results by model

ModelPass rateAvg turnsAvg tokens
gpt-5-nano50%3.07.3k

kubectl task results by model

Taskgpt-5-nano
create-pod-yamleasy
Generate a Pod manifest YAML file called pod.yaml for a pod named 'web' running the nginx:alpine image on port 80, using kubectl create with --dry-run=client and -o yaml.
4t
create-deployment-yamleasy
Generate a Deployment manifest for a deployment named 'api' with 3 replicas running the node:20-alpine image. Save it to deployment.yaml using --dry-run=client -o yaml.
2t
explain-resourceeasy
Use kubectl explain to show the documentation for a Pod's spec.containers field.
2t
kustomize-buildhard
Create a kustomization.yaml file that includes a resource file called deployment.yaml. Then use kubectl kustomize to build and output the result.
4t
Task suite source56 lines · YAML
- id: create-pod-yaml
  intent: Generate a Pod manifest YAML file called pod.yaml for a pod named 'web'
    running the nginx:alpine image on port 80, using kubectl create with
    --dry-run=client and -o yaml.
  assert:
    - file_exists: pod.yaml
    - file_contains:
        path: pod.yaml
        text: nginx
    - file_contains:
        path: pod.yaml
        text: web
    - ran: kubectl.*--dry-run
  setup: []
  max_turns: 4
  difficulty: easy
  category: generate
- id: create-deployment-yaml
  intent: Generate a Deployment manifest for a deployment named 'api' with 3
    replicas running the node:20-alpine image. Save it to deployment.yaml using
    --dry-run=client -o yaml.
  assert:
    - file_exists: deployment.yaml
    - file_contains:
        path: deployment.yaml
        text: replicas
    - file_contains:
        path: deployment.yaml
        text: api
  setup: []
  max_turns: 4
  difficulty: easy
  category: generate
- id: explain-resource
  intent: Use kubectl explain to show the documentation for a Pod's
    spec.containers field.
  assert:
    - ran: kubectl explain
    - output_contains: containers
  setup: []
  max_turns: 3
  difficulty: easy
  category: query
- id: kustomize-build
  intent: Create a kustomization.yaml file that includes a resource file called
    deployment.yaml. Then use kubectl kustomize to build and output the result.
  assert:
    - file_exists: kustomization.yaml
    - ran: kubectl kustomize|kustomize build
  setup:
    - kubectl create deployment api --image=node:20-alpine --replicas=2
      --dry-run=client -o yaml > deployment.yaml
  max_turns: 8
  difficulty: hard
  category: workflow

Evals are a snapshot, not a verdict. We run identical tasks across all models to keep comparisons fair. Results vary with CLI version, task selection, and model updates. Evals run weekly on 4 tasks using @cliwatch/cli-bench.

What you get with CLIWatch

Everything below is running live for kubectl see the latest run. Set up the same for your CLI in minutes.

ModelPass RateDelta
Sonnet 4.595%+5%
GPT-4.180%-5%
Haiku 4.565%-10%

CI & PR Comments

Get automated PR comments with per-model pass rates, regressions, and a link to the full comparison dashboard.

Pass rateLast 30 days
v1.0v1.6

Track Over Time

See how your CLI's agent compatibility changes across releases. Spot trends and regressions at a glance.

thresholds:
  claude-sonnet-4-5: 80%
  gpt-4.1: 75%
  claude-haiku-4-5: 60%

Quality Gates

Set per-model pass rate thresholds. CI fails if evals drop below your targets.

Get this for your CLI

Run evals in CI, get PR comments with regressions, track pass rates over time, and gate merges on quality thresholds — all from a single GitHub Actions workflow.

Compare other CLI evals