The Problem
AI coding agents work in single sessions. You give them a prompt, they execute, they’re done. But real implementations have multiple sections that depend on each other - section B needs section A’s output, sections C and D can run in parallel, section E waits for both.
Right now, the human is the orchestrator. You write an action plan, open multiple tabs, paste prompts, watch for completion, then trigger the next step. For a 7-section implementation like the workspace update pipeline, that’s 30+ minutes of manual sequencing - not counting the mental overhead of tracking what’s done, what’s running, and what’s next.
The deeper issue: every time you start a new tab, the AI has a fresh context window. It doesn’t know what the previous tab did unless you tell it. So you’re also the context bridge - summarizing prior work, pointing to changed files, restating decisions. That’s where mistakes creep in. You forget a dependency, skip a section, or feed stale context.
What I Want
An orchestrator that:
- Reads an action plan and understands the dependency graph
- Dispatches fresh AI sessions per section (clean context, no accumulated garbage)
- Pre-loads the right artifacts into each session (task plan, prior outputs, decisions)
- Tracks state on disk so it survives crashes and can resume
- Lets me review and approve the execution order before it runs
- Runs overnight or in the background without babysitting
I already write the action plans. I already define the dependencies in natural language. I just need something that executes them programmatically.
What’s Currently Available
Pi SDK vs. Claude Agent SDK
Both are from Anthropic. The Pi SDK is the internal agent harness that powers Claude Code - it manages the core loop (tool dispatch, context management, permission checking, session state). The Agent SDK is the public wrapper that lets you launch Claude Code sessions from your own code. Same engine, different access levels.
| Pi SDK (internal) | Agent SDK (public) | |
|---|---|---|
| access | Not directly importable - you adapt/fork it | pip install claude-agent-sdk |
| what it is | The engine inside Claude Code | A wrapper that launches that engine |
| control level | Full - tool registration, dispatch rules, loop control, model routing | Session-level - launch, configure, collect output |
Why Pi matters for orchestration. The Agent SDK treats each session as a black box - you launch it, it runs, you get the result. The orchestrator is a dumb scheduler: “run A, wait, run B.” With Pi-level access, the orchestrator can be smart: observe that A used 80% of budget, downgrade the model for B, notice B is stuck, run a diagnostic, discover B failed because A missed something, re-run A with a narrower scope, then continue. The intelligence lives inside the dispatch loop, not outside it.
GSD v2
gsd-build/gsd-2 is the closest thing to what I want that already exists. It’s a standalone TypeScript CLI (1,493 stars, MIT, very active) that orchestrates multi-step coding projects using a state machine.
Key patterns worth stealing:
- Disk-as-state-machine. All state lives in a
.gsd/directory on disk, reconstructed each dispatch cycle. No persistent in-memory state. This is what enables crash recovery, multi-terminal steering, and session resumption. - Fresh session per unit. Every task gets a clean context window. The dispatch prompt pre-loads exactly the right artifacts.
- Declarative dispatch table. Array of rule objects evaluated in order. Replaces brittle if/else chains with something inspectable.
- The “iron rule.” A task must fit in one context window. If it can’t, split it into two.
- Complexity classifier + model routing. Cheap models for simple work, expensive ones for complex. Rolling history for adaptive learning.
- Crash recovery. Lock file tracks current unit. On crash, the system reads the surviving session log, synthesizes a recovery briefing, resumes.
GSD is built on the Pi SDK - the same internal harness that powers Claude Code. Its patterns aren’t just conceptually transferable; they’re built on the same foundation I’d use.
What GSD doesn’t do: workspace distribution, product management pipelines, cross-project orchestration, human-in-the-loop governance. It’s a single-project coding executor, not a business operations platform.
The Hard Part: Parsing Natural Language Dependencies
Action plans define dependencies in natural language:
## Section B: Hook Implementation
Dependencies: Section A (file structure must exist)
Can parallel with: Nothing - sequential
## Section C: Agent Updates
Dependencies: Section A
Can parallel with: Section D
A parser can extract most of this. But natural language is ambiguous. “Depends on Section A” might mean “needs Section A’s files to exist” or “needs Section A’s tests to pass” or “references a convention established in Section A.” Implicit dependencies - where Section C reads a file that Section B creates but nobody wrote that down - are invisible to a parser.
Codex as a DAG Validator
Instead of demanding perfectly machine-readable formats, use a second AI to validate the parser’s output:
- Parser reads the action plan, generates a proposed execution DAG
- Codex (or any cheaper model) reviews the DAG against the original PRD
- It flags ordering mistakes, missing dependencies, and parallelism gaps
- I review the validated DAG, adjust if needed, approve
- Orchestrator executes the approved DAG
This is a good tradeoff. The parser can be “good enough” - it doesn’t need to handle every edge case because the validation step catches what it misses. That lowers the adoption bar significantly. I don’t need to rewrite action plans in YAML or invent a new format. Write them the way I already do, and the system handles the translation.
The codex review is cheap - it’s reading two documents (the DAG and the PRD) and looking for contradictions. A single API call, not an agentic session.
What Changes From Current Workflow
| step | today | with orchestrator |
|---|---|---|
| Write action plan | Manual | Same |
| Determine execution order | Mental model | Parser + codex validation |
| Dispatch sessions | Paste prompts into tabs | Automated via Pi SDK |
| Pre-load context per session | Manually summarize prior work | Orchestrator injects relevant artifacts |
| Track progress | Watch tabs, mental checklist | Disk state + progress file |
| Handle crashes | Re-orient manually | Auto-resume from last checkpoint |
| Overnight execution | Not possible | Run and check results in the morning |
Status: Researching
Decided on Pi SDK as the foundation (same harness as GSD and Claude Code), with Agent SDK as fallback. Next steps: study Pi SDK internals, prototype a minimal dispatch loop, test fresh-session-per-section pattern. Not yet at the product brief stage.