The Dirty Code Backlog — Why AI Should Learn When to Fix and When to Queue

AI coding agents do not all fail in the same way. Some over-refactor. Others under-refactor.

The problem is not that they all love cleanup. It is that they often lack judgment about when to fix a mess immediately and when to leave a focused note for later.

This distinction matters more as agents write a larger share of our code. You do not feel every file the same way anymore. A ten-line change inside a thousand-line file is easy to skim past, and code quality problems are easily forgotten if nobody records them.

The problem is not “refactor” or “don’t refactor”

When an agent is working in a file, it will often spot nearby issues: duplicated logic, confusing names, stale abstractions, missing tests, weak error handling, old TODOs.

Some agents try to clean all of that up immediately. Now the feature PR is bloated and harder to review.

Other agents ignore everything around them. (You know who you are.) The feature ships, but the codebase quietly gets worse.

Both extremes miss what a good senior engineer usually does. They make a call. They fix the issue if it is blocking, risky, or directly tied to the change. Otherwise, they log it so it can be reviewed and scheduled deliberately.

The senior-engineer pattern

The behavior I want from a coding agent is simple:

Fix it now when the issue is breaking the task, creating risk, or making the current change unsafe.
Log it for review when the issue is real but non-blocking.
Stay focused on the work that was actually requested.

That is how experienced engineers keep momentum without letting code quality drift.

They do not turn every feature into a cleanup sprint. They also do not pretend the messy parts are fine. They surface the issue, make it visible, and move on.

If the project has explicit quality standards, the agent should use those standards to decide what gets fixed now and what gets logged for later.

The backlog pattern

When an agent spots dirty code, it should create a reviewable task rather than silently folding cleanup into the current work. At minimum, a visible note in the PR — so reviewers can see what was found instead of having to figure out what changed and why from the diff alone.

Agent is working on feature X
→ Spots duplicated validation in file Y
→ Decides it is not blocking the feature
→ Creates backlog task: "Extract shared validation from Y"
→ Continues working on feature X

The feature PR stays focused. The code quality issue stays visible. Nothing gets lost.

Why this matters with agents

With hand-written code, you usually build a feel for the codebase as you move through it. With agents, that feedback loop gets weaker.

You might review a small diff without noticing what changed in the surrounding file. You might accept a cleanup because it looks harmless, even though it mixed two concerns. Or you might miss a quality issue completely because the agent never surfaced it.

A backlog helps restore that visibility.

Review stays intentional: the current change is easier to evaluate on its own.
Code health stays visible: the issue is recorded instead of forgotten.
Prioritization stays human: you can schedule the work, batch it, or reject it.
Agents stay focused: context and effort go to the task you actually asked for.

How to classify dirty code vs. must-fix

Not everything should be queued. Some things should be fixed immediately.

Queue it for review:

Code that works but is messy
Duplicated logic that is not causing bugs
Confusing names that do not block understanding
Incomplete cleanup that predates the current work
TODOs, FIXMEs, or abstractions that deserve a proper pass

Fix it now:

Security vulnerabilities
Bugs in the code you are actively modifying
Broken logic that prevents the feature from working
Changes that would be unsafe to merge without the fix
Failing tests or type errors introduced by the work

The rule of thumb is simple: if the issue makes the current change unsafe or incomplete, fix it now. If it is real but non-blocking, log it.

What this looks like in practice

Before:

Feature PR: "Add user preferences API"
Files changed: 23
- Added preferences endpoints
- Refactored auth middleware
- Renamed variables in user service
- Removed unused imports across 8 files

After:

Feature PR: "Add user preferences API"
Files changed: 3
- Added preferences endpoints

Backlog tasks created for review:
- Refactor auth middleware to extract token validation
- Rename variables in user service for clarity
- Remove unused imports project-wide

Same codebase. Better judgment.

The feature gets reviewed quickly. The cleanup work is still visible. And the developer stays in control of when that cleanup actually happens.

AI should act more like a senior engineer

The goal is not to make agents timid. It is to make them deliberate.

A good senior engineer does not fix every issue they notice, and they do not ignore every issue either. They know when to solve the problem in front of them and when to write down the next one.

Coding agents should work the same way.

If an agent encounters something that is not up to the standard you want, it should surface it as follow-up work. That keeps the codebase healthy without turning every task into a refactor spiral.

Fix what matters now. Log what matters later. Keep both visible.