Multi-Agent Development: A Practical Guide
Multi-Agent Development: A Practical Guide
A year ago, having one AI coding agent in your workflow was novel. Today, a single developer might use Claude Code for complex refactors, Cursor for interactive editing, and GitHub Copilot for inline completions - all in the same afternoon, all on the same codebase.
Teams amplify this further. Five developers, each with their preferred AI tool, generating code across shared modules. The codebase evolves faster than any individual can track. And here's the uncomfortable truth: none of these agents know the others exist.
This is the multi-agent coordination problem. It's not hypothetical - it's happening in every team that uses AI coding tools. This guide covers what goes wrong, why it goes wrong, and practical approaches to keep multi-agent development from becoming multi-agent chaos.
The Coordination Problem
In traditional development, coordination happens through social mechanisms: standups, Slack threads, code review, and shared context from working together over time. Developers know what their teammates are working on, roughly where in the codebase, and what patterns to follow.
AI agents have none of this. Each agent operates in isolation:
- No shared memory. Claude Code doesn't know what Cursor did five minutes ago in the same file.
- No awareness of concurrent work. Two agents can modify the same module simultaneously with no merge conflict until push time.
- No stylistic consistency. Each agent has its own default patterns, naming conventions, and architectural preferences.
- No workload visibility. There's no way to see, at a glance, which agents are working on what parts of the codebase.
The result is predictable: conflicting changes, duplicated work, inconsistent patterns, and merge conflicts that take longer to resolve than the original work took to generate.
Session Governance: The Foundation
The first step in multi-agent coordination is making each agent's work visible and bounded. This is what session governance provides.
A governed session wraps an agent's work in a tracked lifecycle with explicit parameters:
# Developer A starts a session for Claude Code
exo session-start --ticket AUTH-42 --vendor anthropic --model claude-opus-4
# Developer B starts a session for Cursor
exo session-start --ticket PERF-17 --vendor anthropic --model claude-sonnet-4
Each session records:
- Who initiated it (developer + agent identity)
- What it's working on (ticket reference with scope and budget)
- When it started and finished
- Where it's allowed to operate (file scope, deny patterns)
- How much it can change (file budgets)
This metadata transforms invisible agent work into auditable, attributed changes. When something breaks, you can answer "which agent touched this file, in which session, for which ticket?"
Scope Isolation
The most powerful coordination mechanism is scope isolation. By assigning non-overlapping scopes to concurrent sessions, you prevent agents from stepping on each other:
# Session for AUTH-42
scope_allow:
- "src/auth/**"
- "tests/test_auth/**"
scope_deny:
- "src/database/**"
# Session for PERF-17
scope_allow:
- "src/database/**"
- "src/cache/**"
scope_deny:
- "src/auth/**"
With these scopes, even if both agents are running simultaneously, they physically cannot modify the same files. Merge conflicts become structurally impossible for governed paths.
Intent Hierarchy
For more complex coordination, ExoProtocol supports an intent hierarchy. High-level intents break down into epics and tasks, each with their own scope and budget:
# Create a top-level intent
exo intent-create --brain-dump "Rebuild authentication system" \
--boundary "Must not change user-facing API contracts" \
--success-condition "All auth tests pass, JWT refresh working" \
--scope-allow "src/auth/**" --scope-allow "tests/test_auth/**" \
--max-files 20 --max-loc 1000
# Break it into tasks assigned to different agents
exo ticket-create --kind task --parent INTENT-001 \
--scope-allow "src/auth/tokens.py" --max-files 3
exo ticket-create --kind task --parent INTENT-001 \
--scope-allow "src/auth/middleware.py" --max-files 3
Each task inherits the intent's boundary constraints but has its own focused scope. Different agents can work on different tasks under the same intent, with automatic validation that no task exceeds the intent's boundaries.
Drift Detection Across Sessions
When multiple agents work on related code, drift becomes a team-level concern. Individual sessions might each have low drift, but the aggregate effect can still be problematic.
ExoProtocol's PR-level check aggregates drift across all sessions that contributed to a pull request:
$ exo pr-check --base main
PR Governance Report
====================
Verdict: WARN
Sessions: 3 matched
Session s-abc (AUTH-42, claude-opus-4):
Drift Score: 0.15
Verdict: PASS
Session s-def (AUTH-43, claude-sonnet-4):
Drift Score: 0.42
Scope Violations: 2 files
Verdict: WARN
Session s-ghi (AUTH-44, gpt-4):
Drift Score: 0.08
Verdict: PASS
Aggregate:
Total files changed: 14
Files with scope violations: 2
Average drift: 0.22
Max drift: 0.42 (session s-def)
This report immediately identifies which agent, in which session, caused the drift. The reviewer knows exactly where to focus: session s-def using Claude Sonnet touched files outside its allowed scope.
Cross-Session Patterns
Over time, drift data reveals patterns:
- Agent-specific tendencies. Does one model consistently exceed file budgets while another stays within bounds? Adjust your prompt or configuration for the drifty one.
- Scope overlap hotspots. If multiple sessions keep touching the same files outside their scope, those files probably need to be explicitly included in someone's scope or broken into their own module.
- Budget calibration. If every session hits 90% of its file budget, your budgets might be too tight. If sessions routinely use only 30%, tighten them to reduce the blast radius.
Feature Traceability with Code Tags
When multiple agents contribute to the same feature, you need to track which code belongs to which feature - regardless of which agent wrote it. ExoProtocol's feature manifest and code tags solve this:
# .exo/features.yaml
features:
auth-jwt:
status: active
description: "JWT-based authentication"
allow_agent_edit: true
auth-refresh:
status: active
description: "Token refresh flow"
allow_agent_edit: true
legacy-session:
status: deprecated
description: "Cookie-based session auth"
allow_agent_edit: false
Agents tag their code with feature annotations:
# @feature:auth-jwt
class JWTTokenManager:
def create_token(self, user_id: str) -> str:
"""Create a signed JWT token."""
payload = {
"sub": user_id,
"exp": datetime.utcnow() + timedelta(hours=1),
}
return jwt.encode(payload, self.secret, algorithm="HS256")
# @endfeature
# @feature:auth-refresh
class TokenRefreshService:
def refresh(self, token: str) -> str:
"""Issue a new token from a valid refresh token."""
claims = self.verify_refresh_token(token)
return self.token_manager.create_token(claims["sub"])
# @endfeature
The exo trace command scans the codebase and validates that:
- Every
@feature:tag references a feature that exists in the manifest - Deleted features don't have lingering code
- Deprecated features aren't being actively edited
- Features locked with
allow_agent_edit: falsearen't modified by agent sessions
This is especially valuable in multi-agent environments because it answers the question: "Which code belongs to which feature?" regardless of which agent or developer wrote it.
Requirement Coverage
Features describe what exists. Requirements describe what must exist. ExoProtocol's requirement registry tracks the latter:
# .exo/requirements.yaml
requirements:
REQ-AUTH-01:
description: "Users must authenticate with email and password"
status: active
priority: high
REQ-AUTH-02:
description: "Tokens must expire after 1 hour"
status: active
priority: high
REQ-AUTH-03:
description: "Refresh tokens must be single-use"
status: active
priority: medium
Code references requirements with @req: or @implements: annotations:
# @implements:REQ-AUTH-01
def authenticate(email: str, password: str) -> AuthResult:
user = user_repo.find_by_email(email)
if user and verify_password(password, user.password_hash):
return AuthResult(success=True, token=create_token(user.id))
return AuthResult(success=False)
# @req:REQ-AUTH-02
TOKEN_EXPIRY_SECONDS = 3600 # 1 hour
The exo trace-reqs command checks coverage:
$ exo trace-reqs
Requirement Coverage Report
============================
REQ-AUTH-01: COVERED (2 implementations)
REQ-AUTH-02: COVERED (1 implementation)
REQ-AUTH-03: UNCOVERED - no implementations found
Violations: 0 errors, 1 warning
WARNING: REQ-AUTH-03 has no implementing code
In a multi-agent environment, requirement traceability ensures that every requirement has implementing code, regardless of which agent wrote it or when. Gaps are detected automatically.
Putting It All Together
Here's a practical workflow for a team of three developers using different AI tools:
1. Initialize governance once:
exo init && exo compile
2. Create a feature manifest and requirements:
# Define features in .exo/features.yaml
# Define requirements in .exo/requirements.yaml
3. Generate agent configs for all tools:
exo adapter-generate --target claude
exo adapter-generate --target cursor
exo adapter-generate --target agents
4. Each developer starts governed sessions:
# Developer A (Claude Code)
exo session-start --ticket AUTH-42 --vendor anthropic --model claude-opus-4
# Developer B (Cursor)
exo session-start --ticket PERF-17 --vendor anthropic --model claude-sonnet-4
# Developer C (Copilot Workspace)
exo session-start --ticket UI-88 --vendor openai --model gpt-4
5. Each developer finishes their session:
exo session-finish --session-id <id> --drift-threshold 0.5
6. PR checks aggregate everything:
exo pr-check --base main
The PR check shows all sessions, all drift scores, all scope violations, feature coverage, and requirement coverage in a single report. The reviewer sees the complete picture.
The Takeaway
Multi-agent development isn't a future problem - it's a present reality. The question isn't whether your team will use multiple AI coding agents. The question is whether you'll have visibility into what those agents are doing.
Session governance provides that visibility. Scope isolation prevents conflicts. Drift detection catches overreach. Feature and requirement traceability ensure completeness. Together, they transform multi-agent chaos into multi-agent coordination.
Start small: initialize governance, set up governed sessions, and run your first PR check. The data will show you where your agents need tighter guardrails and where they're operating well within bounds.
Your AI agents are your most active contributors. Govern them accordingly.