Back to blog

From Specs to Gates: How Acceptance Criteria Close the AI Accountability Gap

·8 min read·ExoProtocol Team
acceptance criteriaspec-to-gatestraceabilityAI accountabilitytest coveragerequirements

From Specs to Gates: How Acceptance Criteria Close the AI Accountability Gap

Your AI coding agent just shipped a feature. The PR is green, tests pass, the drift score is low. But here's the question nobody's asking: does the code actually satisfy the requirements?

Tests passing means the code doesn't crash. It doesn't mean the code does what was specified. That gap between "tests pass" and "requirements met" is where AI-generated code silently goes wrong.

The Problem: Generative Engineering Without Verification

AI coding agents are prolific. Give one a ticket and it'll generate hundreds of lines of production code and test code in minutes. But the tests it writes tend to verify the code it wrote, not the requirements the code was supposed to satisfy. This creates a circular validation problem:

Agent writes code → Agent writes tests for that code → Tests pass → Ship it

What's missing from this loop? Any connection back to the original specification. The agent tested its own implementation, not whether the implementation satisfies the requirements. You can have 100% test coverage and 0% requirement coverage.

This is destructive generative engineering: the appearance of rigor without the substance.

The Chain That Should Exist

In well-governed development, there's a traceability chain:

Requirement → Acceptance Criteria → Test → Code

Every requirement declares what success looks like (acceptance criteria). Every acceptance criteria has at least one test that verifies it. Every test is traceable back to the criteria it validates. The chain is machine-verified, not a checklist someone fills out manually.

When this chain exists, you can answer real questions:

  • "Is REQ-AUTH-01 actually tested?" Check if its acceptance criteria have @acc: annotations in test files.
  • "What requirements have no test coverage?" Run exo trace-reqs --check-tests and look at the untested_acc violations.
  • "Is this test actually verifying a requirement, or is it self-referential?" If the test has an @acc: annotation pointing to a real acceptance criteria, it's grounded. If not, it might just be testing the agent's own implementation.

How It Works

1. Define acceptance criteria in your requirements manifest

# .exo/requirements.yaml
requirements:
  - id: REQ-AUTH-01
    title: "User authentication"
    status: active
    priority: high
    acceptance:
      - ACC-AUTH-LOGIN    # User can log in with email/password
      - ACC-AUTH-LOCKOUT  # Account locks after 5 failed attempts
      - ACC-AUTH-SESSION   # Session expires after 24 hours

  - id: REQ-AUTH-02
    title: "Password reset flow"
    status: active
    priority: medium
    acceptance:
      - ACC-RESET-EMAIL   # Reset email sent within 30 seconds
      - ACC-RESET-EXPIRE  # Reset link expires after 1 hour

Each requirement lists its acceptance criteria by ID. These are the specific, verifiable conditions that define "done."

2. Annotate your tests

# tests/test_auth.py

# @acc: ACC-AUTH-LOGIN
def test_login_with_valid_credentials():
    user = create_user(email="test@example.com", password="secret")
    response = client.post("/login", json={"email": "test@example.com", "password": "secret"})
    assert response.status_code == 200
    assert "token" in response.json()

# @acc: ACC-AUTH-LOCKOUT
def test_account_locks_after_failed_attempts():
    user = create_user(email="test@example.com", password="secret")
    for _ in range(5):
        client.post("/login", json={"email": "test@example.com", "password": "wrong"})
    response = client.post("/login", json={"email": "test@example.com", "password": "secret"})
    assert response.status_code == 423  # Locked

# @acc: ACC-AUTH-SESSION
def test_session_expiry():
    token = login_user("test@example.com", "secret")
    with freeze_time(timedelta(hours=25)):
        response = client.get("/profile", headers={"Authorization": f"Bearer {token}"})
        assert response.status_code == 401

The @acc: annotation is a plain comment. It's language-agnostic: works in Python, JavaScript, Go, Rust, Java, anything with # or // comments. No SDK required.

3. Verify the chain

$ exo trace-reqs --check-tests

Requirement Traceability: FAIL
  requirements: 2 total, 2 active, 0 deprecated, 0 deleted
  code refs: 4
  covered: 2, uncovered: 0
  acceptance criteria: 5 defined, 3 tested
  errors (2):
    - [untested_acc] (manifest): acceptance criteria 'ACC-RESET-EMAIL'
      (from REQ-AUTH-02) has no @acc: annotation in test files
    - [untested_acc] (manifest): acceptance criteria 'ACC-RESET-EXPIRE'
      (from REQ-AUTH-02) has no @acc: annotation in test files

The linter catches that REQ-AUTH-02 has two acceptance criteria with no corresponding tests. No human needs to audit this. The machine tells you exactly what's missing.

What Gets Caught

The acceptance criteria tracing system detects two types of violations:

| Violation | Severity | What it means | |-----------|----------|---------------| | untested_acc | Error | An acceptance criteria is defined in the manifest but no test file has an @acc: annotation for it. The spec says something should be verified, but nothing verifies it. | | acc_orphan | Error | A test file has an @acc: ACC-XXX annotation, but that ACC ID doesn't exist in any requirement's acceptance list. The test claims to verify something that isn't specified. |

Both are errors, not warnings. If your spec says "account locks after 5 failed attempts" and no test verifies that, it's not a suggestion. It's a gap.

Why This Matters for AI-Generated Code

AI coding agents are particularly prone to the accountability gap because they optimize for test passage, not requirement satisfaction. When an agent generates tests, it writes tests that make the code pass, which is tautologically guaranteed since it wrote both.

With acceptance criteria tracing:

  1. The agent can't skip hard tests. If the manifest says ACC-AUTH-LOCKOUT must be tested, the agent needs to write a test annotated with @acc: ACC-AUTH-LOCKOUT. The linter will catch it if it doesn't.

  2. Self-referential tests get caught. If an agent writes a test for something that isn't in the spec, the acc_orphan violation flags it. Tests must be grounded in requirements, not invented by the agent.

  3. Requirement coverage is visible. exo trace-reqs --check-tests gives you a clear number: "5 defined, 3 tested." That's a metric a team can track and improve.

  4. The chain is auditable. From requirement to acceptance criteria to test file, every link is machine-verified. No manual checklists, no trust-based compliance.

The Workflow

Here's how this fits into an ExoProtocol-governed workflow:

1. Human writes requirements + acceptance criteria in .exo/requirements.yaml
2. Agent starts a governed session (exo session-start)
3. Agent implements the feature, writes tests with @acc: annotations
4. Agent finishes (exo session-finish), drift detection + advisory checks run
5. exo trace-reqs --check-tests runs in CI, blocks merge if criteria are untested
6. Human reviews the PR knowing every requirement has verified test coverage

The human defines what "done" looks like. The machine verifies that "done" was achieved. The agent does the work in between, governed and accountable.

Specs Become Gates

The key insight is architectural: your specifications become your merge gates.

A requirement in .exo/requirements.yaml isn't a document someone reads and forgets. It's a live artifact that:

  • Gets traced to code via @req: annotations
  • Gets traced to tests via @acc: annotations on acceptance criteria
  • Gets verified by CI before every merge
  • Gets reported in PR governance checks

When a requirement's acceptance criteria aren't tested, the PR fails. Not because a reviewer noticed, but because the system enforced it. The spec is the gate.

Getting Started

# Install ExoProtocol
pip install exoprotocol

# Initialize governance
exo install

# Add acceptance criteria to your requirements
# (edit .exo/requirements.yaml, add 'acceptance' lists)

# Annotate your tests with @acc: tags
# (add # @acc: ACC-XXX comments in test files)

# Verify the chain
exo trace-reqs --check-tests

No new dependencies. No SDK integration. Just YAML, comments, and a linter. The same deterministic, regex-based approach that powers ExoProtocol's feature tracing and requirement tracing, now extended to close the spec-to-test gap.

Stop shipping code that passes tests but doesn't satisfy requirements. Make your specs enforceable.

Learn more at exoprotocol.dev

Ready to govern your AI-written code?

Install ExoProtocol in 30 seconds. Your next PR will have a governance report.