I wrote the rule. Then I skipped it.

Four days ago I published a blog about a bug I'd nearly shipped. The spec referenced two database fields that did not exist — trade.risk_amount and trade.quantity. A pre-flight check caught it. I wrote up the lesson cleanly. "Before shipping a fix that touches model attributes, grep for the field. Two seconds of work." I even added a standing pre-flight step to the spec review template.

Then on Wednesday, I ran that same check on the fixed spec.

Three more non-existent columns. All in the same spec. All shippable if I'd skipped the check. Round 2 of the exact bug class I'd written the lesson about.

This is awkward to type.

The three defects: stop_loss_price in a SQL UPDATE statement (the actual column is stop_loss. stop_loss_price is a Python dict-key alias that exists 36 times in the codebase but is not a DB column). realized_pnl_usd in the operational baseline SQL (the actual column is pnl). And a "grep INSERT INTO trades in trade_opener.py" instruction that returns zero matches, because the insert goes through a repository layer that builds the SQL dynamically from a Pydantic model.

Any of the three, if shipped, would have produced a silent failure. The stop_loss_price one would have thrown OperationalError inside a surrounding try/except that logs an error and moves on. Trail activation would have silently broken at the exact site the fix was trying to protect. The opposite of what was intended.

Why the lesson didn't stick

The April 19 blog post described the check. It did not enforce it. That distinction is the whole point.

In that post I wrote: "every assumption about a live system has a verification cost." True. I also wrote: "I've now added a standing pre-flight step to the spec review template." Also true. What I didn't write, and what I hadn't yet understood, is that adding a step to a template is not the same as the step getting run.

Templates are prose. Prose lives in documents. Documents get read when someone thinks to read them. Under deadline pressure, the memory of "there's a step to do" is exactly the kind of thing that quietly falls off the list. The more specific reason is worse: I was the spec author. I drafted v7 in the same flow I'd drafted v1-v6. By the time I was ready to ship it, the spec was "mine" in a way that makes checking it feel redundant. Of course the fields exist. I just wrote them.

They didn't. And I hadn't.

This is called author-reviewer collapse. If the person who wrote the bug is the person checking for the bug, you're relying on the bug-writing mental model to detect the bug-writing mental model. That doesn't work. Humans solve this with code review by a second set of eyes. For a solo operator, the only equivalent is a mechanical check (grep, PRAGMA, a script) that doesn't care whose mental model is involved.

What got caught, what didn't, and the pattern

Three things helped.

First: /sprint-review, the custom command I'd built two weeks earlier for exactly this bug class. It takes a spec, extracts every model-field reference and every DB-column reference, greps them against the actual code, and blocks on any mismatch. On v7 it found two blockers. Worth the 30 seconds.

Second: writing the check into the implementation prompt as Step 0. The prompt literally says "run /sprint-review on the spec before writing any code." That rule fired when I opened a fresh session to implement v7. It's the enforcement-of-last-resort. CLAUDE.md says it, the implementation prompt says it, both have to fail for the check to be skipped.

Third: accepting that the failure mode was itself the evidence. V7 was drafted with full knowledge that pre-flight existed. It still shipped with three new defects. That means "the rule is in CLAUDE.md" is not sufficient. You need the rule to fire on a trigger that doesn't depend on remembering.

What I added, after the v7 regression, is a second trigger. The rule now says run /sprint-review at spec-publish time, not just at ship time. Publish = the moment you declare a spec ready, append a "ready for ship" changelog entry, or ask for authorisation. Catching a bug at publish costs 30 seconds. Catching it at ship costs four days of turnaround, which is the time between "spec marked ready" and "actually sitting down to implement and discovering the spec is broken."

The whole thing is now instrumented. Every /sprint-review invocation logs one line: date, spec, trigger (publish / ship), outcome. After 2-3 more specs I'll know whether the publish-time trigger is actually firing or whether it's just more prose in CLAUDE.md.

The four things I'd tell you if you're building something similar

One. A rule in documentation is not a rule in practice. The rules that survive deadline pressure are the ones wired to triggers you can't skip by accident. Pre-commit hooks. CI gates. Step 0 of an implementation prompt. Everything else is aspiration. Test your rules by asking: what fires if I don't remember?

Two. Python identifier presence is not DB column existence. These look identical in a bare grep. They are not. Python identifiers can be dict keys, local variables, parameter names, alias fields, property getters. DB columns live in the schema. Verifying one does not verify the other. Run PRAGMA table_info (or your database's equivalent), match column names exactly, don't trust surrounding code as evidence.

Three. Plausibility is not verification. realized_pnl_usd sounds right. Plenty of trading systems use exactly that name. Mine doesn't. If you're working on a system whose naming conventions you didn't establish, your instincts are worth nothing. Trust the local schema. Trust the grep. Distrust the vibe.

Four. Triage is the discipline, not ceremony. When a check surfaces minor issues, the wrong moves are (a) "it's only a minor, ship it" and (b) "patch everything flagged, keep the review rounds going." Both are uniform deferral. The right move is to distinguish spec-level issues (affects decisions, material) from implementation-level notes (self-correcting, cosmetic), patch the first, annotate the second. This sounds obvious. It is not what I was doing two weeks ago, when every minor got a round trip because "professional means rigorous" gets quietly confused with "professional means exhaustive."

Where I actually am

The fix shipped yesterday. Four bug sites, including one I found during implementation that wasn't in the spec. Seven unit tests, all green. A migration that adds the missing column and backfills 42 closed trades. Another migration that resets the performance baseline so the validation window starts cleanly.

What I'm watching for in the logs is a single line. [TRAIL_ACTIVATED]. Installed specifically because negative observability ("nothing broke") is not the same thing as positive observability ("the fix path ran"). When it emits on a post-migration trade that hit Stage 1 then TP1, the fix is validated. Until then it's a theory. At current volume that sequence fires maybe once a week.

I'll write up whether the publish-time trigger catches anything real, once I have a few more specs' worth of data. For now the interesting thing is that writing the lesson about a verification trap was not enough to avoid the trap. That took rewriting the process so the check fires without anyone remembering to run it.

If you're building something where a single bad assumption can wipe a month of progress — a trading system, a fraud detector, a billing engine, anything with money in the loop — the discipline isn't "be careful." It's "wire the check to a trigger that doesn't care if you're tired."

I wrote the rule. Then I skipped it.

I wrote the rule. Then I skipped it.

Why the lesson didn't stick

What got caught, what didn't, and the pattern

The four things I'd tell you if you're building something similar

Where I actually am

Share this post

Sprint 120 removed a safety rail. 41 trades later, I got the bill.