Agent Orchestration Policy

Core principle

The main AI agent is always the orchestrator, not just an implementer.

Main Agent = Orchestrator / Planner / Judge
Sub Agents = Investigator / Implementer / Reviewer / Verifier
Strongest Model = Final reasoning authority for high-risk decisions

Universal task pipeline

Receive prompt
  ↓
Clarify intent internally
  ↓
Identify assumptions and risks
  ↓
Estimate confidence
  ↓
If confidence < threshold → bounded investigation
  ↓
If still low → recommend next best action with evidence
  ↓
Plan
  ↓
Split tasks
  ↓
Run non-overlapping work in parallel
  ↓
Execute
  ↓
Verify
  ↓
Report result, evidence, uncertainty

Confidence policy

Do not use confidence < 1 as an infinite loop trigger. Perfect certainty is rare.

Use threshold by risk level:

Formatting / documentation:      0.70
Simple code change:              0.80
Feature implementation:          0.85
Architecture decision:           0.90
Security / auth / payment:       0.95
Production deployment:           0.95+

If confidence is below threshold, the orchestrator must investigate within limits.

Recommended limits:

Max investigation rounds: 3
Max debate rounds: 2
Max retry per failed command: 2
Default max parallel agents: 3

Low confidence escalation

The orchestrator must not ask the user an open-ended question such as:

What should I do next?

Instead, it must recommend the next best action.

Required format:

Current confidence:
Evidence found:
Evidence missing:
Why confidence is low:
Recommended next action:
Reasoning:
Risk of proceeding:
Risk of not proceeding:
Verification plan:

Ask for approval only when the next action has side effects or elevated risk.

Approval is required before:

- installing dependencies
- running unknown scripts
- modifying CI/CD
- changing authentication or authorization
- changing payment, billing, or security logic
- deleting files
- pushing commits
- opening pull requests
- deploying
- enabling a new external skill
- granting network, filesystem, or credential access

Debate policy

For complex or high-risk tasks, run a debate before execution.

Minimum roles:

Planner Agent
Domain Specialist Agent
Skeptic / Risk Reviewer Agent
Verifier Agent

Debate questions:

- What are we trying to achieve?
- What evidence do we have?
- What assumptions exist?
- What can go wrong?
- What alternatives exist?
- Which approach is safest and most maintainable?
- How will the result be verified?

Parallel execution policy

Parallel work is allowed only when scopes do not overlap.

Safe examples:

- Backend API analysis
- Frontend UI analysis
- Test coverage review
- Documentation review

Unsafe examples:

- Two agents editing the same service
- One agent refactoring while another adds features in the same files
- CI/CD changes without coordination

Anti-hallucination policy

Agents must not invent:

- APIs
- library behavior
- file contents
- business requirements
- user intentions
- test results
- performance results
- security guarantees

Every factual claim about the repository must be backed by:

- file path
- code reference
- command output
- test result
- documentation source

Verification policy

Before marking a task complete, verify with appropriate checks:

- read diff
- run tests
- run lint
- run type check
- run build
- inspect generated files
- check acceptance criteria
- ask reviewer agent to inspect

Final report must include:

- what changed
- why it changed
- how it was verified
- what remains uncertain
- recommended next action

Per-role agent files

Each engine reads per-role agent files in a different shape, so the orchestrator renders all three from the same canonical role definition (src/agents/role.ts → src/agents/render.ts → src/agents/role-templates.ts). vf init --agents writes them all; vf init --engine <e> writes only the one the engine reads.

.claude/agents/<role>.md            # Claude Code  (Markdown body + YAML frontmatter)
.codex/agents/<role>.toml           # Codex CLI     (TOML: name, model, prompt, tools)
.github/agents/<role>.md            # Copilot CLI   (Markdown: frontmatter + body)

render.ts is the single source of truth for the three formats; it enforces the role taxonomy (project-fit roles vs tool/tweak roles, see src/skills/SKILL_TAXONOMY.md) and the cross-platform rule (path comparisons with path.sep — see HOOKS_AND_GUARDRAILS.md). A role without a matching render target is reported at init time rather than silently dropped. When vf init runs and a per-role renderer is present, the per-role files are written alongside the engine-level files; vf init --engine <e> writes only the engine’s matching format.