Pi vs ZCode vs OpenCode

The practical answer: test ZCode first for an integrated GLM-5.2-native desktop workflow and long-running goals; choose Pi for a minimal, programmable terminal harness; choose OpenCode for a polished general-purpose open-source agent with broad providers, IDE/GitHub integration, and granular permissions.

That is a workflow recommendation, not an empirical ranking. We have not run controlled same-task tests proving that one of these harnesses wins. Use the test card below before standardizing.

Harness	Start here when you want	Main trade-off
ZCode	GLM-5.2-native desktop workflow, Goal Mode, built-in subagents, Git/change review	Tighter Z.AI product fit; less provider-neutral
Pi	Small terminal core, sessions, compaction, custom tools and TypeScript extensions	You assemble more of the workflow; Coding Plan authorization is unclear
OpenCode	Broad providers, terminal/desktop/IDE/GitHub use, agents and explicit permissions	Zen, BYO-provider, and Z.AI Coding Plan are separate billing paths

The Model Is Only Half the System

A coding model does not select files, expose tools, decide when to retry, or recover a long session by itself. The harness does. Six levers materially affect useful output:

Context selection: which files, instructions, diffs, and tool results enter the prompt.
Tool interfaces: whether the model gets precise read/edit/test primitives or a vague adapter.
Agent loop: how the harness plans, acts, observes results, and decides what to try next.
Verification and stopping: whether it runs the requested tests, checks the diff, and stops on evidence.
Permissions: what it may read, edit, execute, publish, or delete without approval.
Recovery and compaction: how it preserves decisions when context fills or a tool call fails.

Harness-Bench isolates this execution layer across shared tasks, budgets, and protocols. Its 5,194 trajectories show substantial variation in completion, process quality, efficiency, and failure behavior across model-harness pairings.

Claw-SWE-Bench gives a concrete GLM example: a minimal direct-diff adapter scored 19.1% Pass@1, while a full adapter reached 73.4% with the same GLM-5.1 backbone. That result shows adapter/interface design can dominate outcomes. It does not measure Pi versus ZCode versus OpenCode, and it does not establish GLM-5.2 performance in any of them.

Pi vs ZCode vs OpenCode

Decision point	Pi	ZCode	OpenCode
Primary interface	Terminal TUI, print/JSON, RPC, SDK	Desktop ADE with terminal, Git, tasks, remote and bot controls	Terminal TUI plus desktop, IDE and GitHub integrations
Provider posture	Broad provider support and custom providers	Deep GLM-5.2 integration	Broad provider support; optional OpenCode Zen
Long work	Persistent branching sessions and compaction	Goal Mode iterates until goal verification passes	Primary agents, subagents, sessions and configurable workflows
Extensibility	TypeScript extensions, custom tools, skills, prompt templates, packages	Skills, MCP, plugins, commands and custom subagents	Agents, commands, tools, MCP and provider configuration
Permissions	Project trust plus extension-controlled tool interception	Confirmation modes from confirm-before-changes through fuller access	Per-tool, per-command and per-agent allow/ask/deny rules
GLM-5.2 access	Pi-native Z.AI coding endpoint; Coding Plan authorization is not established	Z.AI product with GLM models and Coding Plan connection	OpenCode Zen PAYG, direct Z.AI PAYG, or Z.AI Coding Plan
Best first test	Programmable terminal workflow	Integrated GLM-5.2 desktop workflow	General-purpose multi-provider workflow

Three Different Ways To Pay

Do not treat “supports GLM-5.2” as one entitlement.

Path	What it means	Use in
Direct Z.AI PAYG	API usage billed to a Z.AI API account	OpenCode’s `Z.AI` provider; other compatible clients
OpenCode Zen	Optional OpenCode gateway; add credits and pay per request/model pricing	OpenCode only
GLM Coding Plan	Subscription quota restricted to Z.AI’s officially supported tools and products	ZCode and listed integrations such as OpenCode

For the subscription rules, supported tools, and quotas, use the Z.AI Coding Plan guide. For model specs and PAYG pricing, use GLM-5.2.

Copyable Productive Tasks

These prompts constrain scope, define evidence, and make harness behavior easier to compare.

1. Bounded failing-test repair

1
2
3
4
5
6
7
8
9
Fix only the failure in tests/auth/session-expiry.test.ts.

Constraints:
- Reproduce the failure first.
- Read the smallest relevant implementation surface.
- Do not change public APIs or unrelated tests.
- Run the failing test after the patch, then the nearest auth test suite.
- Show the final diff and explain why it fixes the root cause.
- Stop and report if the test cannot be reproduced.

2. Read-only repository audit

1
2
3
4
5
6
7
8
Audit this repository for places where untrusted input reaches shell execution.

Read-only rules:
- Do not edit files, install packages, or run network commands.
- You may use repository search, git log, and existing static-analysis commands.
- Report file:line evidence, reachable data flow, severity, and uncertainty.
- Separate confirmed findings from hypotheses.
- End with the three highest-value verification steps.

3. Multi-file refactor with review

1
2
3
4
5
6
7
8
9
Replace the duplicated retry logic with one shared helper.

Acceptance:
- Preserve existing public behavior and error types.
- Add or update focused tests before deleting the old paths.
- Run the focused tests and the repository's standard lint/typecheck gates.
- Review git diff --check and the final diff for unrelated changes.
- List every changed file and why it changed.
- Do not commit, push, or publish.

Same-Model Harness Test Card

Run the same model, task, repository commit, instructions, budget, and timeout in each harness. Repeat enough times to expose flaky behavior.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
Model: GLM-5.2
Harness/version:
Repository commit:
Task:
Budget/timeout:
Permission mode:

Success (yes/no/partial):
Required tests passed:
Files changed:
Input/output tokens or quota consumed:
Retries/tool calls:
Human interventions:
Unsafe or out-of-scope actions:
Diff quality notes:
Failure/recovery notes:

Compare successful patches per unit of cost/quota and review time, not just whether the harness eventually produced a diff.

Why 1M Context Does Not Guarantee Productivity

One million tokens is capacity, not a promise that the right evidence will be selected or retained. Dumping an entire repository into context can dilute relevant instructions, increase latency, and make failures harder to diagnose. Agent loops also multiply usage: Z.AI estimates one Coding Plan prompt may invoke a model 15–20 times.

Costs rise when a harness:

rereads large files instead of using targeted search;
retries without changing its hypothesis;
launches redundant subagents;
carries noisy tool output forward;
compacts away constraints or earlier test evidence;
keeps working after acceptance criteria already pass.

Start with the smallest sufficient context. Record quota/tokens, retries, interventions, and test evidence in the test card.

Recommendation By Workflow

Choose ZCode when GLM-5.2 is the primary model and you want an integrated desktop environment with explicit goals, ongoing verification, safety confirmations, and built-in collaboration features.
Choose Pi when you want a small terminal harness that can become your own tool through extensions, custom tools, session branching, and customizable compaction.
Choose OpenCode when you need a provider-neutral default, explicit permission policies, reusable agents, and a path across terminal, desktop, IDE, and GitHub workflows.
Keep testing when the task is high risk. None of these feature lists proves lower defect rates in your repository.

Sources

Last verified: July 2, 2026. Harness features, model routing, prices, and subscription authorization can change independently.

The Model Is Only Half the System

Pi vs ZCode vs OpenCode

Three Different Ways To Pay

Copyable Productive Tasks

1. Bounded failing-test repair

2. Read-only repository audit

3. Multi-file refactor with review

Same-Model Harness Test Card

Why 1M Context Does Not Guarantee Productivity

Recommendation By Workflow

Related links

Sources

Related Analysis