Skip to main content

When Safety Features Attack: How Four Independent Safeguards Created Total Trading Paralysis

Published: February 18, 202612 min read
#trader-7#ai-trading#architecture#debugging#building-in-public

Where We Left Off

Two days ago, I wrote about losing $95 because a stale prompt was blocking valid SHORT signals during a regime flip. Sprint 97 fixed that prompt-code mismatch.

What I didn't realize was that Sprint 97 only scratched the surface. Over the next 48 hours, the system went from "occasionally blocking good trades" to complete trading paralysis. Zero trades opened. Every single proposal rejected. The AI was generating correct signals, the code was accepting them, and then... something was killing them anyway.

The P&L dropped from +$415 peak to +$72. Three trades hit stop-loss on fantasy take-profit targets. The system bled $91.52 in 24 hours.

Here's what happened, and the architectural insight that finally fixed it.


The Problem: When Safety Features Fight Each Other

Trader-7 has multiple safety layers. Each one was designed independently and makes perfect sense on its own:

Layer 1 — ATR Floor (Sprint 95): DeepSeek was placing stop-losses too tight. BTC stops at 1.15% when daily volatility is 2%. That's inside the noise band — 57% chance of getting stopped by random price movement. Fix: widen all stops to at least 1.5x ATR. Correct behavior. Prevents noise stop-outs.

Layer 2 — R:R Minimum (Sprint 96B): Enforce a minimum 2.0:1 risk-reward ratio. If you're risking 2%, you should be targeting at least 4%. Makes sense in isolation — prevents low-quality trades.

Layer 3 — TP Stretching: When the ATR floor widens your stop (increasing risk), and R:R drops below minimum, stretch the take-profit target upward to restore the ratio. Sounds reasonable — maintain your R:R discipline.

Layer 4 — LLM Validator: Claude reviews the final proposal and checks that R:R meets minimum requirements. One more quality gate.

Here's the interaction chain that killed every trade:

DeepSeek generates signal: SL = 0.82%, TP = 1.59%, R:R = 1.94:1
ATR floor widens stop (correct, prevents noise): SL = 1.23%, TP = 1.59%, R:R = 1.29:1
R:R check sees 1.29:1 < 2.0:1, stretches TP (incorrect): SL = 1.23%, TP = 2.46%, R:R = 2.0:1
Market never reaches 2.46% — stop-loss hit.

In 25 consecutive cycles over Feb 17-18, 100% of proposals had their take-profit stretched. Not a single signal passed the 2.0:1 minimum naturally after the ATR floor.

And 100% of the trades with stretched TPs hit their stop-loss.

The math is damning: in a range-bound market where BTC's 24-hour range is ~2%, you cannot target a 2.46% take-profit and expect to get filled. The market simply doesn't move that far.


The Insight: R:R and Win Rate Are Inversely Related

This seems obvious when you say it, but the implications are profound for automated system design.

Stretching a take-profit from 1.59% to 2.46% doesn't improve your expected value. It trades win rate for R:R at approximately 1:1.

Random walk math: at 2.0:1 R:R in a driftless market, probability of TP hit first is 33%. At 1.3:1, it's 43%. The expected values are roughly equal.

But there's a crucial difference: the 1.3:1 trade uses the LLM's actual target — a technically-grounded price level based on support, resistance, and market structure. The 2.0:1 trade uses a mathematically-derived fantasy target that has no technical basis.

The LLM's take-profit placement IS the edge. When you move it, you remove the grounding that makes the signal valuable.


The Fix: Separate Signal Quality from Risk Management

Sprint 99 introduces the pre-ATR signal quality gate. The key insight: evaluate signal quality BEFORE the risk management layer modifies anything.

1. Save the original stop-loss (before ATR floor)
2. Apply ATR floor (widen stop if needed)
3. Calculate pre-ATR R:R = TP / original_stop
4. If pre-ATR R:R < 1.5:1 → REJECT (signal quality insufficient)
5. If pre-ATR R:R >= 1.5:1 → ACCEPT with original TP (no stretching)

This cleanly separates two concerns:

  • Signal quality: "Did the LLM find a good setup?" (measured by pre-ATR R:R)
  • Risk management: "How much risk should we take?" (ATR floor widens stops)

Post-ATR R:R may be 1.0-1.5:1, and that's fine. That's the cost of noise protection. A reachable target at 1.3:1 is better than an unreachable target at 2.0:1.

The threshold of 1.5:1 was data-justified: in 25 cycles, good signals had pre-ATR R:R of 1.67-2.03:1, bad signals had 0.92-1.00:1. There was a clean gap between 1.0 and 1.67 with zero observations. 1.5:1 sits perfectly in that gap.


The Bug Behind the Fix

Sprint 99 fixed the TP stretching. But then the system STILL wasn't trading. Another debugging session revealed a third enforcement point we'd missed.

The pipeline has three places that check R:R:

  1. main.py pre-ATR gate (Sprint 99: 1.5:1) — fixed
  2. rules_validator.py thresholds (Sprint 99: lowered to 1.0:1) — fixed
  3. Claude's validator prompt — still said "minimum 2.0:1" and "6% TP / 2% SL"

The Claude validator was a free-text LLM prompt with R:R minimums baked into the guidance text. Sprint 99 changed the code but not the prompt. Claude was reading its instructions, seeing "minimum 2.0:1 for range-bound markets," and rejecting every proposal at 1.24:1.

This is the fourth time in Trader-7's history that a prompt-code mismatch has blocked trading. Sprint 82.3, Sprint 96B, Sprint 97, and now Sprint 99.1.

Sprint 99.1 removed all R:R minimum references from the Claude prompt and refocused it on evaluating target reachability — "is this TP achievable given market conditions?" — instead of checking R:R ratios that the code has already validated.


Also Fixed: The Reevaluator Problem (Sprint 98)

Separately from the R:R saga, Sprint 98 fixed a regime mismatch in the position reevaluator.

The reevaluator uses the local per-asset regime to check if a position's thesis is still valid. But the risk gates use the global BTC-based regime. These frequently diverge — SOL can be locally "trending up" while BTC is globally "WEAK_BEAR."

When this happened, the reevaluator closed a perfectly good SOL SHORT (aligned with global bear regime) because SOL's local regime said "up." Then the replacement LONG was blocked by the defensive stance (which knows about the global bear).

Net result: $112 drawdown from closing correct positions and failing to replace them.

Sprint 98 adds a global regime awareness check: if a position is aligned with the global regime and the opposing signal confidence is below 85%, the reevaluator protects it from closure. Two real activations confirmed working in production.


The Numbers

Metric Feb 16 (Last Blog) Feb 18 (Today) Change
P&L +$320 (+10.7%) +$72 (+2.4%) -$248
Win Rate 44.4% 35.6% -8.8pp
Sharpe Ratio 3.83 0.75 -3.08
Total Trades 36 45 +9
Open Positions 2 SHORTs 0 All closed

The drawdown is real and significant. Most of it came from three trades with stretched TPs hitting stop-loss (-$91.52) plus the reevaluator closing regime-aligned positions (-$112 over previous days).

But the system damage is now identified and fixed. The pipeline is flowing correctly — the post-99.1 log shows Claude approving a proposal at 1.13:1 R:R (impossible before the fix) and rejecting another for a legitimate strategy contradiction (the validator doing its actual job).


The Pattern

This is the third blog post in a row where the main story is "safety features interacting to create deadlock." Sprint 64, Sprint 96, Sprint 97, Sprint 99 — all the same failure mode.

The lesson I keep relearning: every safety feature in a multi-layer pipeline must be designed with awareness of every other safety feature. They can't be independent.

An ATR floor changes the R:R, which triggers the R:R check, which stretches the TP, which the validator evaluates. Each component is correct individually; the system fails collectively.

Sprint 99's pre-ATR gate is the first fix that addresses this architecturally rather than just adjusting thresholds. By evaluating signal quality before risk management modifies the signal, the two concerns can't fight each other.

That's the pattern I should have seen months ago.


What's Next

The system is currently in a WEAK_BEAR regime with all assets deeply oversold (RSI 29-34). DeepSeek is generating SHORT signals into oversold territory, which Claude is correctly flagging as contradictory (you don't short oversold conditions in a mean-reversion strategy).

The real test of Sprint 99 comes when:

  1. The market presents a clear directional opportunity (not contradictory oversold SHORTs)
  2. A signal passes with post-ATR R:R of 1.0-1.5:1 (which would have been impossible before)
  3. We see whether the realistic TP produces a higher win rate than the fantasy TPs

That data should come within the next 24-48 hours as conditions evolve.

The system is paper trading. Past performance is not indicative of future results. This is not financial advice.


Building Trader-7 in public. Follow the journey at jamiewatters.work

Share this post