Back to blog

How to Review AI-Generated Pull Requests

·8 min read·ExoProtocol Team
review AI codeAI PR reviewAI generated code reviewpull requestscode review

How to Review AI-Generated Pull Requests

Your team adopted AI coding tools three months ago. Velocity doubled. PRs tripled. And now your review queue looks like a disaster zone - 47 open PRs, most with 500+ line diffs, all generated in a fraction of the time it takes to review them.

You're not alone. Every engineering team using AI coding agents faces the same asymmetry: code generation is now 10x faster than code review. The tooling for writing code with AI has leaped forward. The tooling for reviewing it hasn't kept up.

This guide covers what to look for in AI-generated PRs, why traditional review practices fall short, and how to automate the parts that machines can handle so humans can focus on what they're uniquely good at.

Why Traditional Code Review Fails for AI Code

Traditional code review assumes a mental model where a human developer makes deliberate, incremental changes. The reviewer reconstructs the author's intent from the diff, checks for correctness, and approves. This model breaks down with AI-generated code for three reasons:

Volume. An AI agent can generate a 1,000-line PR in 20 minutes. A human reviewer needs 60-90 minutes to review it properly. When multiple agents are running concurrently, the review backlog grows faster than the team can drain it.

Context loss. The prompt that initiated the work - "add user authentication with JWT" - isn't visible in the PR diff. The reviewer sees the output but not the input, making it impossible to judge whether the agent stayed on task.

Mixed intent. AI agents often bundle requested and unrequested changes into the same commit. The auth feature you asked for is tangled with a logging refactor you didn't. Separating signal from noise takes longer than reviewing the intended change alone.

The 5 Critical Checks for AI-Generated PRs

When reviewing AI-generated code, these five checks catch the most common problems:

1. Scope Creep

The single most common issue with AI-generated PRs. The agent was asked to do one thing and did five things. Look for:

  • Files modified outside the expected directories
  • New dependencies added in requirements.txt, package.json, or similar
  • Configuration changes (CI, linting, build) that weren't requested
  • "While I'm here" refactors mixed in with the intended change

What to do: Check which files changed and whether each one is clearly related to the stated task. If you can't explain why a file was touched, it's probably scope creep.

2. Ungoverned Changes

Some changes in the PR may not be attributable to any tracked task or session. These are the most dangerous because there's no context for why they exist.

What to do: Look for commits without corresponding ticket references or session identifiers. Orphan commits are red flags - they represent work that bypassed whatever process your team uses.

3. Style and Pattern Drift

AI agents generate syntactically correct code, but it may not match your team's conventions. Common patterns to check:

  • Error handling style (do-nothing catches, overly broad exception handling)
  • Naming conventions (the agent might use camelCase where your codebase uses snake_case)
  • Architectural patterns (the agent might use a different state management approach than your team standard)
  • Import organization and module structure

What to do: Skim for code that "looks different" from the surrounding codebase. AI-generated code often has a distinctive style - slightly more verbose, more comments than your team typically writes, different idioms.

4. Security Patterns

AI agents can introduce subtle security issues, especially when they generate boilerplate. Watch for:

  • Hardcoded secrets, tokens, or API keys (even placeholder ones)
  • Overly permissive file operations or network access
  • Missing input validation on new endpoints
  • SQL queries constructed by string concatenation
  • New .env file modifications or secret-adjacent changes

What to do: Search the diff for strings like password, secret, token, key, api_key. Check that no new files in sensitive paths (.env, *.key, config/secrets) were touched.

5. Test Coverage

AI agents frequently generate code without proportional test coverage. Sometimes they generate tests but the tests are superficial - they test that the code runs without errors but don't assert meaningful behavior.

What to do: Check the ratio of production code to test code. If the PR adds 300 lines of implementation and 0 lines of tests, that's a problem. If it adds 300 lines of implementation and 50 lines of tests, check whether the tests actually validate behavior or just confirm the function doesn't throw.

A Practical Review Checklist

Use this checklist for every AI-generated PR on your team:

## AI PR Review Checklist

### Scope
- [ ] All modified files relate to the stated task
- [ ] No unexpected dependency changes
- [ ] No CI/CD configuration changes (unless intended)
- [ ] No "bonus" refactors mixed with feature work

### Accountability
- [ ] Every commit links to a ticket or session
- [ ] No orphan commits without context
- [ ] The PR description explains what was requested

### Security
- [ ] No hardcoded secrets or tokens
- [ ] No changes to .env or secret-adjacent files
- [ ] Input validation present on new endpoints
- [ ] No overly permissive file/network operations

### Quality
- [ ] Test coverage proportional to new code
- [ ] Tests assert behavior, not just "no errors"
- [ ] Error handling follows team conventions
- [ ] Code style matches the codebase

### Architecture
- [ ] New patterns are consistent with existing architecture
- [ ] No unnecessary abstractions or wrappers
- [ ] Dependencies are justified and maintained

This checklist is useful, but it's also time-consuming. Checking every item manually on a 500-line diff takes significant effort. That's where automation comes in.

Automating the Boring Parts

Most of the checks above can be partially or fully automated. The key insight is that governance data makes automation possible. When every development session is governed - with scope limits, file budgets, and intent tracking - you can programmatically verify compliance.

ExoProtocol's pr-check command (and the corresponding GitHub App) automates these checks:

$ exo pr-check --base main --head feature-branch

PR Governance Report
====================
Verdict: WARN

Commits: 6 total, 5 governed, 1 ungoverned

Session s-abc123 (AUTH-42):
  Verdict: PASS
  Drift Score: 0.12
  Files: 4/5 budget
  Scope: All files within allowed paths

Session s-def456 (AUTH-43):
  Verdict: WARN
  Drift Score: 0.48
  Scope Violations: 2 files outside allowed paths
    - src/utils/helpers.py
    - config/settings.yaml
  File Budget: 7/4 (175%)

Ungoverned Commits: 1
  a1b2c3d "fix typo" - No matching session

Feature Coverage:
  @feature:auth-jwt - 3 files tagged
  @feature:auth-refresh - 2 files tagged

Requirement Coverage:
  @req:REQ-AUTH-01 - Covered (3 implementations)
  @req:REQ-AUTH-02 - Covered (1 implementation)

When installed as a GitHub App, this report appears directly on the PR as a check run. The verdict - pass, warn, or fail - gives reviewers an immediate signal:

  • Pass (green): All commits governed, low drift, no violations. The reviewer can focus on logic and design.
  • Warn (yellow): Some drift detected or minor scope violations. The reviewer should check the flagged areas.
  • Fail (red): Ungoverned commits, boundary violations, or governance integrity issues. The PR needs investigation before merge.

What Humans Should Still Review

Automation handles compliance. Humans should focus on:

Design decisions. Is the approach correct? Is it the right abstraction? Will it scale? These questions require domain knowledge and judgment that automated tools can't provide.

Business logic correctness. Does the code actually do what the ticket requires? Governance checks can verify scope and budget, but they can't verify that the login flow handles edge cases correctly.

Team knowledge. Is there existing code that does something similar? Is this duplicating functionality? Would a different module be a better home for this logic? These questions require codebase familiarity.

Future maintenance. Will this code be easy to modify in six months? Is it documented enough for a new team member to understand? These are judgment calls.

The goal isn't to replace human review - it's to eliminate the tedious, mechanical checks so human reviewers can spend their limited attention on the things that actually require human judgment.

Setting Up Automated PR Reviews

Getting automated governance checks on your PRs takes less than five minutes:

# Install ExoProtocol
pip install exoprotocol

# Initialize governance in your repo
cd your-project
exo init
exo compile

# Generate agent configuration
exo adapter-generate --target claude
exo adapter-generate --target ci

The CI adapter generates a GitHub Actions workflow that runs exo pr-check on every pull request. For richer integration - inline PR comments, check run status, and team dashboards - install the ExoProtocol GitHub App.

Building a Review Culture for AI Code

The transition to AI-assisted development requires updating your review culture, not just your tools. Some practical steps:

  1. Make governance visible. Every PR should show its governance status. If your team can't see drift scores, they can't act on them.
  2. Set drift thresholds. Agree on what drift score triggers a closer look. A threshold of 0.3 for auto-pass and 0.7 for auto-fail is a reasonable starting point.
  3. Require governed sessions. Make it a team norm that all AI-generated work happens in governed sessions. Ungoverned commits should be the exception that triggers discussion.
  4. Review the governance, not just the code. When the governance report says drift is 0.1 and all files are in scope, your review can focus on logic. When it says drift is 0.6, shift your attention to the flagged areas.

AI code review doesn't have to be a bottleneck. With the right automation and the right focus, your team can review AI-generated PRs faster and more thoroughly than manual review alone ever could.

Stop drowning in AI-generated diffs. Let automated governance handle compliance so your team can focus on what matters.

Learn more at exoprotocol.dev

Ready to govern your AI-written code?

Install ExoProtocol in 30 seconds. Your next PR will have a governance report.