Orchestration
An orchestrator is an agent that runs other agents. It picks a task, spawns a worker session to do the work, reads the worker’s output, injects follow-up messages when needed, and keeps a record of what it started and why. The platform treats an orchestrator as a long-lived session: it doesn’t time out while its workers are busy, and it survives pod crashes.
This page describes the data model, the six tools an orchestrator calls, and the failure modes. Spawn permission is one kind of permission grant; the grant model itself lives in Permission grants. Session fundamentals live in Sessions and the scheduler; execution details (pod spec, sidecar) live in Architecture Overview.
Capability is per-agent, not a role
Section titled “Capability is per-agent, not a role”There is no binary orchestrator/worker distinction. Every agent starts as a worker. An agent becomes an orchestrator by holding one or more spawn grants — a row in permission_grants whose subject is the agent, whose grant_type is spawn, and whose details names a child agent:
{ "agent_subject_id": "<parent agent>", "grant_type": "spawn", "details": { "child_agent_id": "<child agent>" }, "scope": "persistent" | "session"}An agent with zero active spawn grants is a pure worker. An agent with one or more is an orchestrator with respect to exactly the named children. Spawning anything else returns agent_not_permitted.
Persistent grants come from the agent edit screen. Session grants come from the runtime request_grant dialog. Both flow through the same POST /api/workspaces/:slug/grants endpoint, which is user-authenticated — no agent can grant itself anything.
Edit screen
Section titled “Edit screen”A “Can spawn” card on the agent edit page lists every other agent in the workspace with a checkbox. Toggling a box writes or revokes a spawn grant with scope='persistent'. The card sits below the repos card and above the schedule card.
Self-grant is rejected: the details.child_agent_id must not equal the agent_subject_id.
Auto-injected system prompt
Section titled “Auto-injected system prompt”When an agent has at least one active spawn grant at session creation, the Job watcher appends a fixed block to the agent’s system prompt. The agent doesn’t have to be told to read the block; it’s there on every session the orchestrator runs.
The exact text the orchestrator sees:
## Other agents you can spawn
You can start and supervise sessions of these agents:
- <agent-slug-1> — <agent-name-1>- <agent-slug-2> — <agent-name-2>
If you need another agent not on this list, call request_grant withgrant_type='spawn' and the child_agent_id you want. The user will seea dialog and can approve or deny.
Tools:- list_spawnable_agents() → [{id, slug, name}]. Returns the children this session may spawn (resolved from active spawn grants).- spawn_session({child_agent_id, model?}) → {session_id, status}. Starts a new session of an agent you're permitted to spawn. Use child_agent_id from list_spawnable_agents. Optional `model` overrides the child's default Claude model for this spawn — pass a short name ("sonnet", "opus", "haiku") or a full id (e.g. "claude-sonnet-4-5@20250929"); the value must be in the deployment's enabled-models allowlist. Use as a cost lever — sonnet for routine work, opus for migrations / auth / tenant-isolation / cross-domain refactors. Omitting the field inherits the child agent's configured model.- read_child_output({child_session_id, after_seq?, limit?}) → {child: {id, status}, events: [{seq, type, payload, timestamp}]}. Pulls the child's event log. Pass after_seq to read only newer events.- inject_message({child_session_id, text}). Sends text to a child as if it were a user message.- expect_quiet_for({seconds, reason?}). Tell the platform you'll be silent so the activity watchdog doesn't escalate you as stuck.
Control flow:- You do not need to poll or block for child progress. The platform will re-wake you by injecting a `user.message` whenever a wake event fires (child reports, child finishes, child goes silent, scheduled tick). End your turn after each decision; do not loop.- Wakes carry a `driverless: true` flag when no human is watching. In driverless mode, do not ask clarifying questions — if a decision genuinely needs a human, emit a `share` titled "Needs human review: <summary>" and end the turn.
Rules:- Children run in this workspace. Cross-workspace spawning is not allowed.- Do not spawn children in a loop. If you need many similar tasks, write them as one prompt to a single child.- Children inherit this workspace's git installations. They can clone and push the same repos this agent can.The Job watcher interpolates the bullet list from the current set of active persistent spawn grants at pod creation time. The list is snapshotted to the pod env — adding a new spawn grant while a session is running does not change what the agent sees until the session restarts. Session-scope grants approved during the run don’t appear in the prompt either; the orchestrator discovers those by calling spawn_session and seeing it succeed. The “if you need another agent” line tells the agent to try anyway.
Parent and child
Section titled “Parent and child”Every session has an optional parent.
ALTER TABLE sessions ADD COLUMN parent_session_id UUID REFERENCES sessions(id) ON DELETE SET NULL, ADD COLUMN parent_tool_use_id TEXT;parent_session_idisNULLfor top-level sessions.parent_tool_use_idrecords which specific tool call spawned the child, so a parent with several open children can route messages back to the right conversation turn.- A child inherits the parent’s workspace. Cross-workspace spawning is rejected at the api layer.
- Cycles are rejected at spawn time: a session cannot spawn an ancestor.
Agent kind — the discriminator
Section titled “Agent kind — the discriminator”Whether an agent is a worker or an orchestrator is an explicit property on the agent record, not inferred. The discriminator is agents.kind:
ALTER TABLE agents ADD COLUMN kind TEXT NOT NULL DEFAULT 'worker' CHECK (kind IN ('worker', 'orchestrator', 'scheduled'));Three values, three pod-shape families. The enum is finite and binding — adding a fourth kind (say, ingest for long-lived no-spawn data pumps) is a schema change, deliberately.
| kind | Intended use |
|---|---|
worker | Default. Short-lived per-task invocation. One session per triggering event. |
orchestrator | Long-lived singleton that plans and commissions. One session per agent, resumed across pod restarts. |
scheduled | Periodic invocation. Starts on a cron trigger, runs one pass of the agent’s heartbeat, exits. Same pod shape as worker; differs only in how sessions are triggered (the scheduler, not a human). |
kind is orthogonal to permission grants and to runtime_type (the SDK / runtime image — Claude Code, Codex, Gemini, etc.). An orchestrator agent can drive any supported runtime; permissions (spawn, git, etc.) are granted separately. The three are layered:
runtime_type— which agent SDK runs inside the containerkind— how the pod lives and diespermission_grants— what the agent is allowed to do
One agent, one session (orchestrators only)
Section titled “One agent, one session (orchestrators only)”For a worker, the mapping is familiar: agent is a class, each session is an instance. Many sessions per agent, disposable.
For an orchestrator, the agent is the session. At most one non-terminal session exists per orchestrator agent, and every trigger — a user message, a child signal, a scheduled wake, a pod restart — routes to that same session. Starting a session for an orchestrator is “find or create”: if a session with status IN ('pending', 'running') already exists, return that id. Don’t create a second row.
-- Enforce the singleton at the database layer (simplest: application-level-- "find or create" in the session-start use case, which already looks up-- the agent to read its kind and can short-circuit for orchestrators).Consequences the rest of the system is built around:
- Restart means resume, not recreate. When the orchestrator pod dies,
restartPolicy: OnFailurebrings it back. The new pod uses the sameSESSION_ID(from the Job’s labels) and callsquery({ resume: SESSION_ID, ... }). The Claude Agent SDK rehydrates the transcript from the PVC. The session row’sstatusstaysrunningthrough the restart; pod failure is invisible at the session layer. - Triggers inject, not create. A scheduled wake for an orchestrator doesn’t create a session — it injects a user message (e.g. “heartbeat tick”) into the existing session. A “Run” click on an orchestrator that already has a live session opens that session; it doesn’t start a new one.
- Terminal states are noteworthy. A worker reaching
completeis success. An orchestrator reachingcompleteorfailedis a real end — deliberate shutdown, or an unrecoverable crash. The UI surfaces this clearly rather than burying it in a history list. - Children remain disparate. An orchestrator spawns many children over its lifetime. Each child is a normal worker session with its own memory, bound to the orchestrator via
parent_session_id. The tree has one permanent root and many transient branches.
Orchestrator idle model
Section titled “Orchestrator idle model”An orchestrator spends most of its wall-clock time idle, and pausing is the right shape. Each turn is one short reasoning cycle: receive an input, decide, act (commit, spawn, share), end the turn. Between turns the pod is alive (the transcript sits on the PVC, the sidecar holds its NATS subscription), but the agent process is parked waiting for the next user message. No polling, no blocking primitives, no tight loops that keep the container hot.
Three states worth naming:
- Active — reasoning, calling tools, writing to its repo, responding to a message. Ends when the model produces a terminal text response.
- Attentive idle — turn has ended cleanly; the orchestrator has active commissioned work in flight. Next input will arrive from the platform (child finished, watchdog fired, scheduled tick) or from a human.
- Quiescent — no active children, no pending work. Nothing to do until the next scheduled tick, a user message, or some external event the platform watches for.
State 3 is most of the time. That shapes the pod’s resource requests: an idle orchestrator holding 1GB + 0.5 CPU as the scheduler books is wasteful. The request/limit split matters — request is what gets booked by the K8s scheduler, limit is the ceiling when active.
Why the control flow lives in the platform, not the agent
Section titled “Why the control flow lives in the platform, not the agent”An earlier draft of this design proposed a wait_for_child_signal MCP tool that the orchestrator would call to block within a turn waiting for events. That pushes the control-flow responsibility onto the agent’s reasoning — every orchestrator CLAUDE.md has to encode how to loop, how to thread watermarks, how to distinguish wake reasons. Small prompting mistakes strand the orchestrator. Tokens burn on polling that does no work. Human-in-the-loop moments become workarounds.
The revised model: the orchestrator always ends its turn after each decision. Whatever needs to happen next — wait for a child, act on a report, handle a tick, escalate to a human — is expressed as a user.message the platform injects into the session when the triggering event fires. The orchestrator’s CLAUDE.md only has to describe what to do when woken, not how to keep yourself woken. Operators who aren’t expert prompt authors can still land a working orchestrator.
Server-driven wakes
Section titled “Server-driven wakes”The platform auto-injects a user.message into an orchestrator’s session whenever an event arrives that warrants reasoning. Five wake kinds, all delivered as structured user messages the agent can parse the same way it reads a human typing:
| kind | Triggered by | Inserted payload shape |
|---|---|---|
message | A child emits agent.message_to_caller (explicit signal from child to parent) | { kind: "message", from_session_id, from_agent_slug, body, needs_response } |
state_change | The platform observes a child’s sessions.status transitioning to complete or failed | { kind: "state_change", from_session_id, from_agent_slug, new_status, completed_at, error_message? } |
watchdog | Server timer detects no events from a child for activity_timeout_seconds — possibly stuck | { kind: "watchdog", from_session_id, from_agent_slug, seconds_since_last_event, last_status_message? } |
checkup | Server timer fires on a cadence regardless of child activity — “just checking in” heartbeat | { kind: "checkup", snapshot: [{ session_id, slug, seconds_since_last_event, last_status }, …] } |
heartbeat | Scheduler fires for the orchestrator (per-agent cron) | { kind: "heartbeat", driverless: true, text: <agent.heartbeat_md content> } |
Each wake is a complete, self-contained turn boundary. The orchestrator reads the payload, decides, acts, and ends its turn — possibly without any tool calls if nothing warrants action. The wake payload carries everything the agent needs; no out-of-band state, no remembered watermarks.
The publisher mechanics
Section titled “The publisher mechanics”All wakes flow through the same internal path: POST /api/internal/sessions/:id/messages with a source: "platform" tag and the structured payload. The api publishes to x1.session.<parent_id>.input; the parent’s sidecar picks it up and injects it as a user.message event, identical in shape to messages typed by a human in the UI — just marked with a source flag the UI can render as a platform-originated wake rather than a human prompt.
Four small server-side watchers produce these:
- Session status watcher. Listens on
x1.session.*.eventsforsession.completed/session.failed. Looks up the session’sparent_session_id. If set, emits astate_changewake to the parent. - Activity watchdog. Periodic sweep (say every 60s) of running sessions whose
parent_session_idis not null. IfNOW() - last_event_at > activity_timeout_seconds, emits awatchdogwake to the parent. Uses exponential backoff per child to avoid spam (5 / 10 / 20 / 40 / 60 min cap; resets on any real event). - Checkup timer. Per-orchestrator setting; fires on cadence even when no child is silent. Emits a
checkupwake with a lightweight snapshot of all active children. - Scheduler integration. When cron fires for an orchestrator agent, the scheduler checks for a live session. If one exists, it injects a
heartbeatwake rather than creating a second session (which would fail the DB singleton trigger). If no live session exists, it creates one with the heartbeat content as the first message.
Driverless mode (heartbeats and platform wakes)
Section titled “Driverless mode (heartbeats and platform wakes)”Every server-injected wake carries a driverless: true flag. That tells the orchestrator “no human is watching this turn in real time — don’t ask clarifying questions.” If the orchestrator genuinely needs a human decision, it emits a share titled Needs human review: <summary> with the tradeoffs and ends the turn; the UI surfaces unresolved human-review shares so an operator can respond when they next check in.
Human-typed messages in the UI land in the session as ordinary user.message events without the driverless flag — the orchestrator knows a person is present and can be more conversational.
Pause is a first-class state
Section titled “Pause is a first-class state”Because the orchestrator always ends its turn after acting, pausing is not an error or an anomaly. If the orchestrator reviews its state and concludes there’s no work that can progress without human input, it emits a share (or just an agent.status with status: "quiescent") and ends. The next wake — scheduled or human — resumes it. No tokens burned in the meantime.
This is what makes the platform habitable for users who aren’t expert prompt authors. They don’t have to write a bulletproof polling loop into their CLAUDE.md. They write “here’s what to do when you’re woken for X,” and the platform takes care of the when.
Pod-shape by kind
Section titled “Pod-shape by kind”The Job watcher reads agents.kind when building the session Job:
| Property | worker | orchestrator | scheduled |
|---|---|---|---|
activeDeadlineSeconds | 3600 | unset (no hard deadline) | 3600 |
restartPolicy | Never | OnFailure | Never |
backoffLimit | 0 | 6 | 0 |
| Idle timeout | 15 min default → exit | 7 days → exit (effectively “never”; see below) | tight (next cron wake) |
| Resources (requests) | cpu 500m, mem 1Gi | cpu 50m, mem 512Mi | cpu 500m, mem 1Gi |
| Resources (limits) | cpu 1, mem 2Gi | cpu 1, mem 2Gi | cpu 1, mem 2Gi |
| Workspace volume | emptyDir | per-session PersistentVolumeClaim | emptyDir |
| Session model | one per trigger | one singleton, resumed | one per cron tick |
| Extra MCP tools exposed | none | spawn / read / message / cancel / report | none |
| System prompt addition | none | ”Other agents you can spawn” block | none |
| Wake mechanism | n/a (one-shot) | Server-injected user.message per wake kind (see § Server-driven wakes) | scheduler creates a fresh session per tick |
The orchestrator’s 7-day idle cap is a safety net, not a working duration. Real usage looks like “end turn, wait for next server-injected wake, process it.” Between wakes the Claude Code process is parked but alive, spending zero tokens. The cap only fires if the orchestrator is genuinely abandoned — in which case the session ends cleanly (exit 0) and a future scheduler tick or human action starts a fresh one via the singleton find-or-create path.
All kinds share the same agent container image and the same wire event schema. The difference is the lifetime contract, the pod’s resource footprint, and which tools the agent sees.
Five operations
Section titled “Five operations”Everything an orchestrator-flavored action reduces to five MCP tool calls. The sidecar translates each call into a platform action. These are what the orchestrator does; the server-driven wakes are what arrives to the orchestrator between calls.
1. Spawn a child
Section titled “1. Spawn a child”spawn_session({ child_agent_id: "<uuid from list_spawnable_agents>" })The sidecar POSTs:
POST /api/internal/sessions{ "workspace_slug": "...", "agent_slug": "code-writer", "parent_session_id": "...", "parent_tool_use_id": "t_042", "triggered_by": "orchestrator", "initial_prompt": "Refactor the checkout module..."}The api looks up the parent agent’s active spawn grants and checks that one names the requested child agent (either as scope='persistent' or as scope='session' with the current session id). If not, the call returns agent_not_permitted. Otherwise it creates a pending session and the Job watcher picks it up on the next tick.
sequenceDiagram
participant O as Orchestrator agent
participant OS as Orchestrator sidecar
participant A as api
participant JW as Job watcher
participant C as Child pod
O->>OS: spawn_session(agent_slug, prompt)
OS->>A: POST /api/internal/sessions
A->>A: check permission_grants
A->>A: INSERT sessions (pending, parent_session_id=...)
A-->>OS: { session_id }
OS-->>O: { session_id }
A->>JW: next tick
JW->>C: create Job
2. Read a child’s events
Section titled “2. Read a child’s events”read_child_output({ child_session_id: "019d...", after_seq: 42, limit: 500})Returns:
{ status: "pending" | "running" | "complete" | "failed", last_seq: 57, events: [ { seq, type, payload, timestamp }, ... ]}The sidecar handles the call by querying the api’s internal endpoint GET /api/internal/sessions/:id/events?after_seq=N. Events come back oldest-first, up to a server-side cap of 1000 per call. The orchestrator uses last_seq as the next after_seq cursor.
read_session is the pull-based inspection path. It complements report_to_parent (below), which is push-based: workers voluntarily send messages when they need attention. An orchestrator can read at any time without the child having to do anything special.
Permission: the parent can read any session in its own workspace whose parent_session_id is the caller’s session id — nothing else. No reading of other orchestrators’ children.
3. Report to parent (called by the child)
Section titled “3. Report to parent (called by the child)”The child agent calls:
message_caller({ summary: "I found three call sites that use the old validator. Should I update all of them?", ...})The child sidecar publishes to x1.session.{parent_session_id}.input with the caller tagged:
{ "text": "I found three call sites...", "from_session_id": "019d...", "from_agent_slug": "code-writer", "request_id": "parent_tool_use_id_from_spawn", "options": ["yes, update all", "list them first"]}The parent sidecar injects the message into its agent. The orchestrator sees it as a user message; the UI renders it with a chip showing the child agent’s name and a link to the child session. The request_id matches the parent_tool_use_id from the spawn, so the SDK routes the answer to the right tool call when the orchestrator responds.
report_to_parent is always enabled for a child that has a parent — it doesn’t need a grant.
4. Message a child
Section titled “4. Message a child”inject_message({ child_session_id: "019d...", text: "Yes, update all three. Commit after each file so we can review."})The sidecar POSTs to the api’s internal endpoint, which publishes to x1.session.{child_id}.input. The child treats the orchestrator’s message exactly like a human operator’s.
Permission check is the same as read_session: the target session must have parent_session_id = orchestrator's session id.
5. Cancellation (today: operator-side only)
Section titled “5. Cancellation (today: operator-side only)”There is no cancel_session MCP tool today. To stop a child mid-flight,
an operator uses POST /api/workspaces/:slug/agents/:agentId/sessions/:id/cancel
or the cancel button on the session detail page. An orchestrator that
needs cancellation as a primitive should file a request_grant for an
operator to act, or end its turn and surface a share titled
“Needs cancellation:
cancel_session({ session_id: "019d..." })Flips the child’s session row to failed and terminates its pod. The orchestrator can call this on any child it spawned. The platform does not auto-cancel children when the parent completes — orphaned children run until they finish or the reaper catches them. An orchestrator that invokes cancel_session should follow it in the same turn with a structured post-mortem share — see post-mortem convention.
Why no await_children or wait_for_child_signal
Section titled “Why no await_children or wait_for_child_signal”Both were proposed in earlier drafts as blocking primitives the orchestrator could call to wait for events within a turn. They’re intentionally absent: waiting is the platform’s job, not the agent’s. The orchestrator ends its turn after acting; the server-driven wake path reawakens it when something meaningful happens. This keeps the orchestrator’s prompt simpler and removes a class of bugs where the blocking tool holds a turn open for hours while consuming reasoning budget.
Post-mortem convention
Section titled “Post-mortem convention”When an orchestrator calls cancel_session, the next action in the same turn must be a share with a structured post-mortem. The share’s title starts Post-mortem: followed by the child session’s slug or a short summary; the body uses these sections in order:
- Root cause — one sentence
- What happened — 2–4 sentences, narrative
- Evidence — seq numbers from the child’s event stream, or excerpts from
read_session - Lessons — what to change in the next attempt’s brief
- Next steps — respawn with narrower scope / defer / block on human input
The share is a first-class artifact already persisted in the workspace’s Shares page (via agent.share events). No new table, no new MCP primitive. Discipline is enforced in the orchestrator’s CLAUDE.md, not in code — but the convention is strict enough that future shared tooling (summaries, dashboards) can query for post-mortems by title prefix.
Resume after crash
Section titled “Resume after crash”Orchestrators pin their SDK session id to the platform session id. On pod restart, the agent container reads SESSION_ID from env, passes it to query({ resume: SESSION_ID, ... }), and the Claude Agent SDK rehydrates the conversation from the transcript on the pod’s persistent volume.
Orchestrator pods use per-session PVCs:
volumes: - name: workspace persistentVolumeClaim: claimName: x1-session-{shortSessionId} # first 12 chars of the session UUIDThe PVC is created by the Job watcher when the agent’s kind is orchestrator. Whether the agent currently holds any spawn grants is independent — the PVC backs the SDK transcript’s resume-on-restart contract that all orchestrators rely on. The restartPolicy: OnFailure + backoffLimit: 6 combination lets the pod come back on node failure without the watcher noticing.
Worker pods do not use PVCs. They’re short-lived; a crashed worker is a failed session, not a restart.
What’s persisted
Section titled “What’s persisted”| Kind | Location |
|---|---|
| ”Agent X can spawn Y” | permission_grants (grant_type=‘spawn’) |
| “I spawned X” | sessions.parent_session_id on the child |
| ”X told me Y” | session_events on the parent (user message with from_session_id) |
| “I told X Y” | session_events on X (user message) |
| “X finished” | session_events.type = 'session.completed' on X |
| ”My conversation so far” | Claude Agent SDK transcript on the PVC |
No separate “orchestration log” table. Recovery on restart: re-enumerate children of this session id via SELECT * FROM sessions WHERE parent_session_id = ?, resume the SDK transcript, carry on.
UI rendering
Section titled “UI rendering”A session detail page shows:
- Its own events in the main stream.
- A Children panel listing direct child sessions with status pills, linking to each child’s detail page.
- In the event stream,
user.messageevents whose payload carriesfrom_session_idrender with a child-session chip (agent name, short session id, clickable). They still sort byseqwith everything else.
The child session detail page has a breadcrumb back to its parent. No nested stream rendering — the parent’s page is the index, the child’s page is the full log.
Failure modes
Section titled “Failure modes”Orchestrator pod dies mid-spawn. The child’s sessions row either doesn’t exist yet (transaction rolled back) or exists with status='pending' and no pod. The resumed orchestrator re-enumerates children; the Job watcher picks up the pending row and starts a pod. Idempotency on parent_tool_use_id prevents duplicate spawns — the api rejects a second spawn with the same (parent_session_id, parent_tool_use_id).
Child sidecar dies while running. The parent stops receiving report_to_parent messages. A reaper in the api flips children whose pod has been gone more than N minutes to status='failed' and emits a synthetic session.failed event — which the session-status watcher picks up and turns into a state_change wake for the parent. The parent gets the same wake it would have gotten from a clean exit; from its perspective, the child finished.
Orchestrator dies with children still running. Children keep running; their events keep flowing to NATS and landing in session_events. When the orchestrator’s pod restarts via restartPolicy: OnFailure, it resumes the SDK transcript from the PVC. Any wake events that fired while it was down were buffered as user.message rows in session_events; the resumed agent processes them in order on its next turn.
Infinite spawn loop. Depth is capped at one for now: spawn_session rejects calls from any session whose parent_session_id is non-null. Deep nesting is out of scope until we have a use case.
Cross-workspace spawn. Rejected at the api layer. spawn_session returns workspace_mismatch if the requested agent’s workspace doesn’t match the orchestrator’s.
Grant revoked mid-session. The allowlist is snapshotted into pod env when the Job is created. Revoking a spawn grant while a session is running does not retroactively disallow spawns already enumerated in the agent’s system prompt. It does gate future spawn_session calls at the api — the next spawn returns agent_not_permitted even if the agent’s prompt still lists the now-removed child. The agent may be confused. Documented, not fixed.
Dangling grant references. If the child agent named in a spawn grant’s details is deleted, the agent_subject_id or child_agent_id foreign key (depending on which is referenced in the details schema — child_agent_id is not an FK because it lives inside jsonb) will not cascade. A daily sweep in the permissions domain flips those to revoked_at.
Out of scope
Section titled “Out of scope”Intentional non-goals:
- Multi-level nesting. Orchestrators cannot spawn orchestrators. Two levels only.
- Cross-workspace orchestration. A worker spawned by an orchestrator lives in the same workspace.
- Broadcast messaging. No “message all children” primitive. Orchestrators loop over session ids.
- Automatic child cancellation on parent completion. The orchestrator explicitly calls
cancel_sessionif it wants children stopped.
Permission model
Section titled “Permission model”Orchestrators run with the same identity as the user who started them. Spawning a child uses the same installation_id resolution as any other session — the child’s pod gets git credentials via the same sidecar → api → GitHub App path. There is no separate “orchestrator service account.”