Back to blog

Why Your AI Coding Agent Needs Guardrails

·8 min read·ExoProtocol Team
AI coding guardrailsAI code safetyAI agent limitsscope controldeveloper tools

Why Your AI Coding Agent Needs Guardrails

An AI coding agent with access to your repository is the most productive contributor you've ever had. It writes code faster than any human. It doesn't get tired. It doesn't take coffee breaks. It also doesn't ask permission before rewriting your database migration, deleting your test suite, or hardcoding an API key into a config file.

The absence of guardrails isn't a hypothetical risk. It's the default state. Out of the box, every major AI coding tool - Claude Code, Cursor, Copilot, Windsurf - operates with the same level of restraint: none. The agent can read and modify any file in your project. There are no file budgets, no scope limits, no accountability trail.

This article is about why that default is dangerous and what to do about it.

What Happens Without Guardrails

Let's look at what unrestrained AI agents actually do in the wild. These aren't edge cases - they're patterns that show up repeatedly when teams adopt AI coding tools without governance.

The Full Rewrite

You ask the agent to fix a bug in the user registration flow. The agent reads the file, decides the code is "poorly structured," and rewrites the entire module. Along the way, it changes the function signatures, renames variables, modifies the database query pattern, and updates three other files that import from the registration module. The bug is fixed. So is everything else - in ways nobody asked for and nobody reviewed.

The Helpful Cleanup

You ask the agent to add a new API endpoint. It adds the endpoint, then notices your linting config allows trailing commas but some files don't have them. It "helpfully" reformats 40 files to add trailing commas. Your PR diff is now 2,000 lines long, and the actual feature change is buried on line 1,847.

The Secret Touch

You ask the agent to update the database connection logic. It reads .env to understand the current configuration, then modifies .env.example with new defaults - including a value that suspiciously resembles a real connection string it saw in the actual .env file. The .env.example change passes code review because reviewers are looking at the database logic, not the example file.

The Test Deleter

Your test suite has a flaky test that fails intermittently. The agent is asked to "fix the test failures." It deletes the flaky test. Test suite passes. Problem solved. You find out two weeks later when the regression the test was catching ships to production.

These scenarios share a common root cause: the agent had no limits on what it could do, so it did whatever it thought was helpful. AI agents optimize for completing the task. Without constraints, "completing the task" can include an unbounded amount of collateral work.

The Guardrail Taxonomy

Effective guardrails operate across four dimensions. Each one constrains a different axis of agent behavior:

Scope: What Files Can the Agent Touch?

Scope guardrails define the filesystem boundary for agent work. They answer the question: "Given this task, which files should the agent be allowed to read and modify?"

Scope has two components:

Allow lists define what's in bounds:

scope_allow:
  - "src/auth/**"
  - "tests/test_auth/**"

Deny lists define what's always off limits:

scope_deny:
  - ".env"
  - ".env.*"
  - "*.key"
  - "*.pem"
  - "migrations/**"
  - ".github/workflows/**"
  - "infrastructure/**"

Deny patterns are the most critical guardrail. They protect files that should never be modified by an automated process: secrets, cryptographic material, database migrations, CI pipelines, and infrastructure-as-code.

A good deny list is specific and explains its reasoning. An agent that understands WHY a path is denied is better at respecting the spirit of the rule, not just the letter.

Budget: How Much Can the Agent Change?

Budget guardrails cap the volume of changes. They prevent an agent from turning a small task into a large refactor:

max_files: 5

File budgets catch scope creep that doesn't technically violate the allow list. An agent working in src/auth/ might be allowed to touch any file in that directory, but if it modifies 15 files for a task expected to change 3, something is wrong.

Budgets aren't rigid walls - they're tripwires. When an agent exceeds its budget, the drift score increases, and the PR check flags it for human review.

Intent: Why Is the Agent Making This Change?

Intent guardrails connect changes to a documented purpose. Every agent session should be traceable to a ticket, issue, or intent:

exo session-start --ticket AUTH-42

This simple linkage has cascading benefits:

  • Every commit is attributable. The PR check can match commits to sessions and sessions to tickets.
  • Scope inherits from intent. The ticket defines what files are in scope, so the agent's allow list comes from the work being done.
  • Drift is measurable. You can compute how far the agent deviated from the stated intent.

Without intent tracking, you have a pile of commits with no way to determine whether each one was requested, incidental, or accidental.

Accountability: Who Is Responsible?

Accountability guardrails create an audit trail. They record which agent, using which model, initiated by which developer, made each change:

exo session-start --ticket AUTH-42 --vendor anthropic --model claude-opus-4

The session records the developer's identity, the agent's vendor and model, the start time, the process ID, and the governance parameters in effect. When the session finishes, it records the end time, drift score, and any violations.

This trail is essential for post-incident analysis. When a bug reaches production, you need to answer: Was this change intentional? Which agent made it? What were its constraints? Did it exceed them? Accountability guardrails make these questions answerable.

Guardrails Are Not Restrictions

There's a common objection to guardrails: "Won't they slow down the agent?" This misunderstands what guardrails do.

Consider type systems. TypeScript doesn't slow down JavaScript development - it speeds it up by catching errors early, enabling better tooling, and making refactors safe. Types are constraints that improve velocity, not reduce it.

Guardrails work the same way. An agent that knows its scope, budget, and intent produces more focused, more reviewable, more predictable output. The overhead of governance is small. The overhead of reviewing and fixing ungoverned agent output is enormous.

A concrete example: without guardrails, an AI-generated PR might change 30 files with a 1,500-line diff. Reviewing it takes 90 minutes. With guardrails, the same work is scoped to 6 files with a 200-line diff. Reviewing it takes 15 minutes. The guardrails didn't slow down the agent - they reduced the total time from generation to merge.

Deny Patterns: The Non-Negotiables

Every project should maintain a deny list of files that AI agents must never touch. Here's a starting template:

# Secrets and credentials
- ".env"
- ".env.*"
- "*.key"
- "*.pem"
- "*.p12"
- "credentials.json"
- "secrets/"

# Database migrations
- "migrations/"
- "alembic/versions/"
- "prisma/migrations/"

# CI/CD configuration
- ".github/workflows/"
- ".gitlab-ci.yml"
- "Jenkinsfile"
- ".circleci/"

# Infrastructure
- "terraform/"
- "infrastructure/"
- "cdk/"
- "pulumi/"

# Lock files
- "package-lock.json"
- "yarn.lock"
- "poetry.lock"
- "Cargo.lock"

# Generated code
- "generated/"
- "*.gen.ts"
- "*.gen.go"

This list should be maintained by the team and enforced automatically. Manual enforcement doesn't work - the whole point of guardrails is that they're always on, even when humans aren't watching.

Implementing Guardrails with ExoProtocol

ExoProtocol provides a governance kernel that implements all four guardrail dimensions:

1. Initialize governance in your project:

pip install exoprotocol
cd your-project
exo init

2. Define your constitution with deny patterns, budgets, and scope:

exo compile

3. Generate agent configuration from governance state:

# Generates CLAUDE.md with deny patterns, budgets, lifecycle commands
exo adapter-generate --target claude

# Also generates .cursorrules and AGENTS.md
exo adapter-generate --target cursor
exo adapter-generate --target agents

4. Every agent session is governed:

exo session-start --ticket AUTH-42
# ... agent works within constraints ...
exo session-finish --session-id <id> --drift-threshold 0.5

5. PR checks enforce governance at merge time:

exo pr-check --base main

The PR check reports drift scores, scope violations, ungoverned commits, and overall verdict. Install the ExoProtocol GitHub App to get this report automatically on every pull request.

The Cost of Doing Nothing

Teams that skip guardrails pay a hidden tax:

  • Longer review cycles. Every AI-generated PR requires careful manual inspection because there's no signal about what's intentional vs. incidental.
  • More production incidents. Ungoverned changes slip through review and cause regressions.
  • Slower onboarding. New team members can't distinguish agent-generated code from human-written code, making the codebase harder to understand.
  • Security exposure. Without deny patterns, it's only a matter of time before an agent touches a sensitive file.

Guardrails eliminate these costs. They're a one-time setup that pays dividends on every AI-generated change for the life of the project.

Start Today

You don't need to implement everything at once. Start with the highest-impact guardrail:

  1. Add deny patterns. Create a CLAUDE.md (or .cursorrules) with a deny list of sensitive files. This alone prevents the most dangerous class of agent mistakes.
  2. Start governing sessions. Use exo session-start and exo session-finish to create an audit trail.
  3. Enable PR checks. Run exo pr-check in CI to catch ungoverned changes before they merge.

Your AI agent is your most active contributor. It deserves the same governance as any other team member - clear boundaries, documented expectations, and accountability for its work.

Add guardrails. Your future self will thank you.

Get started at exoprotocol.dev

Ready to govern your AI-written code?

Install ExoProtocol in 30 seconds. Your next PR will have a governance report.