Work-Unit Orchestration
Contents
- Purpose
- Core Concept: The Work Unit
- When VibeFlow Creates Work Units
- Lifecycle
- Quality Assurance
- Pre-Flight Gate
- Handoff Triage
- Resource and Progress Tracking
- Sub-Agent Guardrails
- Mapping to the Rest of the Spec
Purpose
This document defines how VibeFlow decomposes a task into scoped, file-backed work
units, enforces quality gates on each, and keeps orchestration resources easy to
track. It is the operational mechanism behind AGENT_ORCHESTRATION_POLICY.md: that
document sets the policy (orchestrator role, confidence thresholds, debate, anti-
hallucination), this document defines the file structure, gates, and tracking ledger that
make the policy observable and auditable.
The model is adapted from the tentacle / OctoGent pattern: one orchestrator, many scoped work units, each persisted as files so nothing is lost between agent boundaries.
Core concept: the work unit
A work unit is a scoped slice of the task stored as files under the canonical
.vibeflow/ tree (no new top-level directories — keeps the minimal-footprint principle
in MASTER_SPEC.md):
.vibeflow/workunits/<name>/
CONTEXT.md # scope, constraints, key files the agent needs (the dispatch prompt)
evidence/ # recorded gate output as JSON: <engine>.result.json, investigation.json
Today the orchestrator writes exactly CONTEXT.md (the per-unit dispatch prompt) and the
evidence/ folder (<engine>.result.json from each dispatch, investigation.json when a
sub-1.0 confidence run is investigated). Per-unit STATE — status, confidence, gates, owner,
skills, resources, evidence paths — does NOT live in a per-unit meta.json; it lives
centrally in .vibeflow/WORKFLOW_STATE.json (see Resource and progress tracking below).
Planned (not yet implemented): TODO.md (atomic checkbox deliverables) and HANDOFF.md
(agent results + evidence on completion). The shape a planned meta.json would carry — and
which today is held inside each work_units[] entry of WORKFLOW_STATE.json — is:
{
"name": "auth-refactor",
"scope": ["src/auth/**"],
"status": "pending",
"confidence": 1.0,
"depends_on": [],
"evidence_owner": "test-engineer",
"implementation_owner": "backend-engineer",
"acceptance_signal": "all auth tests pass and login flow works",
"resources": { "agents": 0, "tokens": 0, "cost_usd": 0.0, "wall_seconds": 0 }
}
When VibeFlow creates work units
Follow the minimal-footprint principle — do not create work-unit files for trivial tasks.
1-2 files, single concern → direct execution, no work unit
3+ files, single module → optional single work unit for tracking
3+ files, multiple modules → REQUIRED: one non-overlapping work unit per module
Multi-phase / delegated agents → REQUIRED: one work unit per delegated agent
Bug with multiple hypotheses → recommended: one work unit per hypothesis
Non-overlapping scopes are mandatory. Two work units must never declare overlapping file scopes — parallel agents would otherwise overwrite each other.
Lifecycle
Clarify → Plan → Execute → Verify → Goal-eval → Close
Clarify : spec is made implementation-ready before any decomposition
Plan : decompose into non-overlapping work units; write CONTEXT.md + TODO.md
Execute : dispatch one agent per work unit; independent units run in parallel
Verify : run quality gates on each unit; record evidence
Goal-eval : orchestrator checks the overarching goal; loop for gaps or proceed
Close : merge, runtime-verify, persist learnings, clean up
Quality assurance
Decision confidence gate
Before creating, dispatching, merging, or closing a work unit, confidence must be 1.0.
Confidence < 1.0 means the orchestrator is still guessing.
If confidence < 1.0:
- stop implementation/merge/close decisions for that scope
- split the ambiguity into atomic research questions
- dispatch read-only research/validation agents on the strongest model
- record evidence + rejected alternatives in HANDOFF.md
- proceed only at confidence 1.0 or with an explicit, logged user override
This composes with the risk-based thresholds in AGENT_ORCHESTRATION_POLICY.md: those
thresholds decide when bounded investigation is required; this gate forbids merging or
closing on a guess.
Verification gates
Each work unit’s output passes these gates before it is accepted. Build, lint, test, and review are mandatory; docs and QA-audit are conditional.
Build : compiles / type-checks (never skip)
Lint : style, unused imports, formatting (never skip)
Test : logic, regressions, contracts (never skip)
Review : security, design, scope creep (never skip — separate review agent)
Context isolation (ADR-001): reviewer receives ONLY goal + spec + diff.
No dispatch prompt, no self-report, no workflow reasoning chain.
`buildReviewerPrompt()` enforces this — do not bypass.
Docs : README/API/JSDoc/CHANGELOG sync (skip only for internal refactors)
QA audit : cross-check by a different agent (high-risk changes only: auth/data/billing/infra)
Evidence requirement
A gate is passed only when VibeFlow holds the recorded proof — never on an agent’s claim
that “tests pass” or “lint is clean”. The orchestrator (or hooks) runs the command and
stores output under evidence/.
- "all tests pass" is not evidence; the recorded test command + pass/fail counts are.
- A gate that was not run is recorded as: "not proven yet — run <command>".
- A DONE handoff with no evidence ledger is treated as AMBIGUOUS and requires triage.
Verifiable evidence format (ADR-004)
vf verify warns when evidence strings are free-text claims rather than machine-verifiable
artifacts. Accepted formats:
| Format | Example |
|---|---|
| Command output capture | bun test 2>&1 | tail -3 → "12 pass, 0 fail" |
| File:line reference | src/gates.ts:47 — added isVerifiableEvidence() |
| Commit SHA | commit abc1234 — feat: add goalEval gate |
| Test name:result | pending-hooks > clearPending: removes all entries [0.04ms] |
| CI run URL | https://github.com/magicpro97/vibeflow/actions/runs/123 |
| Git command output | git diff --stat origin/main HEAD → 3 files changed |
Rejected: "done", "tests pass", "implementation complete", any string under 10 chars.
Use vf units evidence <name> --add 'bun test 2>&1 | tail -3 → "<output>"' to record.
Phase 2 (current): gate failure — vf verify exits 1 when evidence is free-text.
Escape hatch: vf verify --allow-unverified-evidence / vf orchestrate --allow-unverified-evidence.
This is the file-backed enforcement of the policy rule "no verification, no completion"
(`MASTER_SPEC.md`) and ties directly to the hook `final-verify` and `skill-compliance`
events in `HOOKS_AND_GUARDRAILS.md`.
## Pre-flight gate
Before any unit is dispatched the orchestrator runs a **3-layer gate** for the
target engine (`src/preflight-delegate.ts`):
```text
1. presence → is the engine binary on PATH?
2. auth → is the engine authenticated for this user?
3. quota → does the engine have usable capacity right now?
If any layer fails the orchestrator auto-falls-back to the next engine that
passes all three layers (claude → codex → copilot by default; see
AGENT_ORCHESTRATION_POLICY.md for the priority). A unit that finds no engine
ready is recorded as BLOCKED and surfaced on the triage banner
(WEB_UI_DESIGN.md) — the dispatch never silently no-ops. The gate consults the
probe cache (src/probe-cache.ts, 60 s stable / 5 s short TTL) and only hits the
network / engine CLI on miss; vf doctor --refresh invalidates the cache.
Quota signals come from src/engine-quota.ts, which parses:
claude → `claude usage --json`
codex → `codex doctor --usage`
copilot → `gh api copilot`
Exhaustion, 429 (rate-limited), 403 (forbidden / billing region), and auth
failures all trigger fallback. A BLOCKED unit with no fallback engine is the
terminal state — the orchestrator surfaces the reason and stops.
Handoff triage
Every agent writes a structured handoff with a terminal status. Triage precedes the verification gates — do not run Build/Lint/Test/Review on a unit with a triage status until the underlying issue is resolved.
DONE → proceed to verification gates
BLOCKED → read HANDOFF.md; create new unit for missing scope, adjust scope, or cancel
TOO_BIG → re-decompose into 2+ smaller non-overlapping units
AMBIGUOUS → clarify spec/constraints with user, then re-dispatch with updated CONTEXT.md
REGRESSED → fix the regression before any other gate
Goal-evaluation loop
After all per-unit gates pass, the orchestrator (never a sub-agent) evaluates the overarching goal against success criteria that were defined during Plan — not invented now.
Goal met → proceed to Close
Goal partially met → return to Plan; create NEW units for the gaps (never re-open closed units)
Goal blocked → record the gap in HANDOFF.md, surface to the user
The goal-eval result is recorded as evidence so the decision is auditable.
Resource and progress tracking
The orchestration ledger lives in .vibeflow/WORKFLOW_STATE.json and aggregates every
work unit, so progress and resource use are observable at a glance instead of being lost
inside agent contexts.
{
"task_id": "TASK-123",
"goal": "Refactor auth without breaking login",
"success_criteria": ["all auth tests pass", "login e2e green"],
"work_units": [
{
"name": "auth-refactor",
"status": "verifying",
"confidence": 1.0,
"owner_agent": "backend-engineer",
"skills_used": ["repo-onboarding"],
"gates": { "build": "pass", "lint": "pass", "test": "running", "review": "pending" },
"resources": { "agents": 1, "tokens": 48213, "cost_usd": 0.42, "wall_seconds": 95 },
"evidence": ["evidence/build.log", "evidence/lint.txt"]
}
],
"totals": { "units": 3, "done": 1, "tokens": 152104, "cost_usd": 1.31, "wall_seconds": 410 }
}
Tracked per work unit and rolled up to totals:
- status and gate state (pending / running / verifying / done / blocked)
- decision confidence
- owner agent and skills used (for skill-compliance checks)
- resources: agent count, tokens, estimated cost, wall-clock time
- evidence file paths
Web UI surfacing
The web UI (WEB_UI_DESIGN.md) renders this ledger as a live orchestration dashboard so
the user can follow quality and resource consumption without reading raw logs:
- Work-unit board: one card per unit showing status, gates, owner, confidence
- Gate strip: build / lint / test / review with pass / fail / running / pending
- Resource meter: tokens, estimated cost, elapsed time per unit and in total
- Evidence drawer: links to recorded gate output under evidence/
- Triage banner: any BLOCKED / TOO_BIG / AMBIGUOUS / REGRESSED unit is surfaced first
Updates stream over the existing WebSocket/SSE channel (WEB_UI_DESIGN.md).
CLI surfacing
The same ledger is inspectable from the terminal (see COMMAND_REFERENCE.md):
vf units status # board: status, gates, owner, confidence per unit
vf units show <name> # one unit: scope, todos, gates, evidence, resources
vf units resources # token / cost / wall-time totals across units
vf units evidence <name> # recorded gate output for a unit
Sub-agent guardrails
Conventions injected into every dispatched agent’s CONTEXT.md and enforced by hooks where
possible (HOOKS_AND_GUARDRAILS.md, SECURITY_MODEL.md):
- stay in scope: never edit files outside the unit's declared scope
- escalate, don't expand: write the gap to HANDOFF.md and stop; the orchestrator decides
- no over-implementation: do only what TODO.md specifies
- handoff before stopping: always write a structured handoff with status + changed files
- the orchestrator commits/merges; sub-agents must not push or merge
Mapping to the rest of the spec
AGENT_ORCHESTRATION_POLICY.md → policy (roles, confidence thresholds, debate, parallelism)
WORK_UNIT_ORCHESTRATION.md → mechanism (file-backed units, gates, ledger) [this doc]
HOOKS_AND_GUARDRAILS.md → enforcement points (final-verify, skill-compliance, pre-write)
WEB_UI_DESIGN.md → operator view (work-unit board, gates, resource meter)
GENERATED_FILES.md → .vibeflow/workunits/* file layout
WORKFLOW.md → end-to-end run that drives these units
Related: Agent Orchestration Policy · Skill Discovery and Evolution Edit this page on GitHub
Spec-first test generation (ADR-002)
When work_unit.spec is non-empty, vf can generate test stubs from the spec BEFORE
dispatching the implementer. The LLM sees ONLY the spec — no source code, no implementation.
Phase 1 (current): generateSpecFirstTests() is injectable — opt-in via --spec-first flag (phase 2).
Phase 2: vf orchestrate --spec-first generates *.spec-first.test.ts before dispatch.
Protection rule: pre-write hook blocks any write to *.spec-first.* files during dispatch.
These files are oracle tests — the implementer must pass them, not change them.
Use generateSpecFirstTests({ unitName, spec, llmFn }) to generate stubs programmatically.