The Human Is the Orchestrator (and That's Where Mistakes Creep In)

The Problem

AI coding agents work in single sessions. You give them a prompt, they execute, they’re done. But real implementations have multiple sections that depend on each other - section B needs section A’s output, sections C and D can run in parallel, section E waits for both.

Right now, the human is the orchestrator. You write an action plan, open multiple tabs, paste prompts, watch for completion, then trigger the next step. For a 7-section implementation like the workspace update pipeline, that’s 30+ minutes of manual sequencing - not counting the mental overhead of tracking what’s done, what’s running, and what’s next.

The deeper issue: every time you start a new tab, the AI has a fresh context window. It doesn’t know what the previous tab did unless you tell it. So you’re also the context bridge - summarizing prior work, pointing to changed files, restating decisions. That’s where mistakes creep in. You forget a dependency, skip a section, or feed stale context.

What I Want

An orchestrator that:

Reads an action plan and understands the dependency graph
Dispatches fresh AI sessions per section (clean context, no accumulated garbage)
Pre-loads the right artifacts into each session (task plan, prior outputs, decisions)
Tracks state on disk so it survives crashes and can resume
Lets me review and approve the execution order before it runs
Runs overnight or in the background without babysitting

I already write the action plans. I already define the dependencies in natural language. I just need something that executes them programmatically.

What’s Currently Available

Pi SDK vs. Claude Agent SDK

Both are from Anthropic. The Pi SDK is the internal agent harness that powers Claude Code - it manages the core loop (tool dispatch, context management, permission checking, session state). The Agent SDK is the public wrapper that lets you launch Claude Code sessions from your own code. Same engine, different access levels.

	Pi SDK (internal)	Agent SDK (public)
access	Not directly importable - you adapt/fork it	`pip install claude-agent-sdk`
what it is	The engine inside Claude Code	A wrapper that launches that engine
control level	Full - tool registration, dispatch rules, loop control, model routing	Session-level - launch, configure, collect output

Why Pi matters for orchestration. The Agent SDK treats each session as a black box - you launch it, it runs, you get the result. The orchestrator is a dumb scheduler: “run A, wait, run B.” With Pi-level access, the orchestrator can be smart: observe that A used 80% of budget, downgrade the model for B, notice B is stuck, run a diagnostic, discover B failed because A missed something, re-run A with a narrower scope, then continue. The intelligence lives inside the dispatch loop, not outside it.

GSD v2

gsd-build/gsd-2 is the closest thing to what I want that already exists. It’s a standalone TypeScript CLI (1,493 stars, MIT, very active) that orchestrates multi-step coding projects using a state machine.

Key patterns worth stealing:

Disk-as-state-machine. All state lives in a .gsd/ directory on disk, reconstructed each dispatch cycle. No persistent in-memory state. This is what enables crash recovery, multi-terminal steering, and session resumption.
Fresh session per unit. Every task gets a clean context window. The dispatch prompt pre-loads exactly the right artifacts.
Declarative dispatch table. Array of rule objects evaluated in order. Replaces brittle if/else chains with something inspectable.
The “iron rule.” A task must fit in one context window. If it can’t, split it into two.
Complexity classifier + model routing. Cheap models for simple work, expensive ones for complex. Rolling history for adaptive learning.
Crash recovery. Lock file tracks current unit. On crash, the system reads the surviving session log, synthesizes a recovery briefing, resumes.

GSD is built on the Pi SDK - the same internal harness that powers Claude Code. Its patterns aren’t just conceptually transferable; they’re built on the same foundation I’d use.

What GSD doesn’t do: workspace distribution, product management pipelines, cross-project orchestration, human-in-the-loop governance. It’s a single-project coding executor, not a business operations platform.

The Hard Part: Parsing Natural Language Dependencies

Action plans define dependencies in natural language:

## Section B: Hook Implementation
Dependencies: Section A (file structure must exist)
Can parallel with: Nothing - sequential

## Section C: Agent Updates
Dependencies: Section A
Can parallel with: Section D

A parser can extract most of this. But natural language is ambiguous. “Depends on Section A” might mean “needs Section A’s files to exist” or “needs Section A’s tests to pass” or “references a convention established in Section A.” Implicit dependencies - where Section C reads a file that Section B creates but nobody wrote that down - are invisible to a parser.

Codex as a DAG Validator

Instead of demanding perfectly machine-readable formats, use a second AI to validate the parser’s output:

Parser reads the action plan, generates a proposed execution DAG
Codex (or any cheaper model) reviews the DAG against the original PRD
It flags ordering mistakes, missing dependencies, and parallelism gaps
I review the validated DAG, adjust if needed, approve
Orchestrator executes the approved DAG

This is a good tradeoff. The parser can be “good enough” - it doesn’t need to handle every edge case because the validation step catches what it misses. That lowers the adoption bar significantly. I don’t need to rewrite action plans in YAML or invent a new format. Write them the way I already do, and the system handles the translation.

The codex review is cheap - it’s reading two documents (the DAG and the PRD) and looking for contradictions. A single API call, not an agentic session.

What Changes From Current Workflow

step	today	with orchestrator
Write action plan	Manual	Same
Determine execution order	Mental model	Parser + codex validation
Dispatch sessions	Paste prompts into tabs	Automated via Pi SDK
Pre-load context per session	Manually summarize prior work	Orchestrator injects relevant artifacts
Track progress	Watch tabs, mental checklist	Disk state + progress file
Handle crashes	Re-orient manually	Auto-resume from last checkpoint
Overnight execution	Not possible	Run and check results in the morning

Status: Researching

Decided on Pi SDK as the foundation (same harness as GSD and Claude Code), with Agent SDK as fallback. Next steps: study Pi SDK internals, prototype a minimal dispatch loop, test fresh-session-per-section pattern. Not yet at the product brief stage.