Two Bugs, One Day: How Fixing a Hallucination Revealed the Real Problem

Date: January 3, 2026

The Bug That Masked Another Bug

Today we fixed two critical bugs in Trader-7. But here's the twist: we only found the second one because we fixed the first.

This is a story about defense in depth, root cause analysis, and why "the obvious problem" is rarely the real problem.

Bug #1: The Hallucinating AI

Our signal generator (DeepSeek) had developed a peculiar habit. When analyzing Bitcoin, it would occasionally return signals for Ethereum, Solana, or XRP - assets for which it had received zero technical data.

The result? Ghost signals. Decisions based on nothing but the model's imagination.

We implemented a three-layer defense:

Defensive Filter: Catch rogue signals at runtime
Prompt Constraints: Explicitly tell DeepSeek what NOT to analyze
Data Cleanup: Remove phantom assets from input

After 14 hours of monitoring: zero hallucinations. Bug squashed.

But then something strange happened.

The Real Bottleneck Emerges

With clean data flowing through the system, we finally saw what was actually happening:

BTC LONG @ 72% confidence
- Passed confidence threshold (70%)
- Passed consensus voting
- Rejected by R:R validation: "got 2.00:1, need 3:1"

Wait. Our system rejected a 72% confidence signal from the entire AI pipeline... because of a hardcoded validation rule?

Bug #2: The Architectural Mismatch

Deep dive into the code revealed a triple mismatch:

Architecture Design (Sprint 44):

Range-bound markets: 1.5:1 R:R (quick exits)
Trending markets: 3.0:1 R:R (ride the momentum)

Prompt Instruction:

"Set TP1 at 2x risk" (fixed, regardless of market)

Validation Code:

if rr < 3.0: reject() (hardcoded, ignores regime)

The strategist had identified a range-bound market. The architecture says range-bound trades should use 1.5:1 R:R. DeepSeek provided 2.0:1 R:R.

This was a valid trade that got rejected because our validation ignored market context.

The Fix: Regime-Aware Validation

We implemented what the architecture always intended:

min_rr = {
    'mean-reversion': 1.5,   # Range-bound markets
    'range-bound': 1.5,      # Quick exits at support/resistance
    'momentum': 3.0,         # Ride the trend
    'trend-following': 3.0,  # Larger moves
}.get(regime, 3.0)  # Conservative default

if rr < min_rr:
    raise ValueError(f'R:R must be >= {min_rr}:1 for {regime} regime')

Now the system asks: "What kind of market are we in?" before deciding if a trade's risk/reward makes sense.

The Cascade Effect

Here's what made this interesting:

Hallucination bug prevented us from seeing clean signal data
Clean data revealed signals were being generated correctly
Correct signals were being rejected by broken validation
Broken validation was ignoring regime context we already had

Each fix revealed the next layer of the problem.

Results

Sprint 62 Deployment:

4 files modified (~185 lines)
9 new unit tests (12/12 passing)
Regime-aware R:R validation live
Backward compatible (defaults to 3:1 for unknown regimes)

Before Fix:

BTC @ 72% in range-bound market: REJECTED (2:1 < 3:1)

After Fix:

BTC @ 72% in range-bound market: ACCEPTED (2:1 > 1.5:1 minimum)

Key Takeaways

Fix bugs in order: The hallucination fix was prerequisite to finding the real problem
Architecture vs. Implementation: Our architecture was smart (regime-aware R:R). Our code wasn't following it.
Hardcoded values are traps: Every magic number should have a "why" and a "when"
Defense in depth works both ways: Layers of protection also create layers of debugging
The obvious problem is rarely the only problem: "Low confidence threshold" wasn't the issue. "Hallucinations" wasn't the full issue. The real bottleneck was hidden two layers deep.

What's Next

The system is now deployed and running. We're watching for:

Range-bound signals that would have been rejected before
Momentum signals still correctly requiring 3:1 R:R
Overall signal-to-trade conversion rate

The ghost signals are gone. The validation is intelligent. Now we see if the system actually trades.

Building Trader-7 in public. Follow along for more stories from the trenches of algorithmic trading.

Two Bugs, One Day: How Fixing a Hallucination Revealed the Real Problem

Two Bugs, One Day: How Fixing a Hallucination Revealed the Real Problem

The Bug That Masked Another Bug

Bug #1: The Hallucinating AI

The Real Bottleneck Emerges

Bug #2: The Architectural Mismatch

The Fix: Regime-Aware Validation

The Cascade Effect

Results

Key Takeaways

What's Next

Share this post

The SaaS Boilerplate Killer

The CLAUDE.md Paradox: When Your AI Framework Needs to Protect User Preferences