Skip to main content

17 Days of AI Trading: +10.7% Return and the Bug That Just Cost Us $95

Published: February 16, 20268 min read
#trader-7#ai-trading#build-in-public#crypto#llm

Trader-7 is an experimental crypto trading system powered by multiple LLMs — DeepSeek generates trade signals, Claude validates them, and a custom strategist reads market regimes. I'm building it solo, in public, to see if an AI pipeline can consistently trade crypto markets. It's currently paper trading (no real money yet).

The Numbers So Far

Trader-7 has been paper trading since January 30. Here's where we stand after 17 days:

Metric Value Assessment
Return +10.7% ($3,000 to $3,320) Strong for 17 days
Win Rate 44.4% Right where we want it
Sharpe Ratio 3.83 Exceptional (>2.0 is considered excellent)
Total Trades 36 completed ~2 trades per day
Peak Return +13.8% ($3,415) on Feb 15 Before the reversal hit

A 44% win rate might sound mediocre. It's not. With a 2:1 risk-reward ratio, you only need to win 34% of the time to break even. At 44%, the expected value math works out to roughly +$0.33 per dollar risked. The system loses more often than it wins, but wins are meaningfully larger than losses. That's the edge.

The Sharpe ratio of 3.83 is the standout metric. In traditional finance, anything above 2.0 is considered excellent. Above 3.0 is institutional-grade. This number needs a caveat though — it's calculated over only 17 days with 36 trades. Sharpe ratios stabilize over longer periods, and crypto volatility can inflate short-term numbers. I'd want 3+ months of data before trusting this figure completely. But the direction is encouraging.


The $95 We Left on the Table

Here's what's been frustrating. Our P&L peaked at +$415 on February 15 — then the market reversed. BTC regime flipped from WEAK_BULL to WEAK_BEAR. The correct move was obvious: close the LONGs, open SHORTs.

The system's AI knew this. DeepSeek started generating SHORT signals immediately. BTC SHORT at 75% confidence. ETH SHORT at 65-90% confidence, 20 consecutive times over the next 14 hours.

None of them got through.

The Claude validator — our final quality gate — was rejecting every single SHORT proposal. The reason? A stale prompt reference from three sprints ago that said the minimum risk-reward ratio was 3.0:1. The actual code threshold had been lowered to 2.0:1 for momentum strategies back in Sprint 96B. But nobody updated the prompt.

So the system sat there. LONGs bleeding in a bear market. AI generating the correct SHORTs. Validator killing them with outdated rules.

It took 14 hours to open our first SHORT position. By then, we'd given back $95 — nearly a quarter of our accumulated profit.


Did We Manage the Losses Well?

This is the honest assessment: mixed.

What worked:

  • The regime watchdog (Sprint 93) correctly identified that our LONG positions opposed the new bear regime. It fired Stage 1 (tightened stop-loss to breakeven) within the first cycle of the regime flip.
  • The position reevaluator (Sprint 89) worked perfectly when given the chance. When a BTC SHORT finally passed validation at 18:02 UTC, the reevaluator detected the conflict with the open BTC LONG and closed it at -$7.62. Small, controlled loss.
  • Max hold time (72 hours) served as the ultimate safety net. When the regime watchdog's Stage 2 close failed due to a database bug, the position was eventually cleaned up by the hold timer.

What didn't:

  • The system held a SOL LONG for 12+ additional hours after the regime flipped because no opposing signal could pass the stale validator.
  • ETH was completely untradeable for 20+ hours — 20 consecutive SHORT proposals rejected.
  • The regime watchdog's Stage 2 close attempt hit a database error (sqlite3.IntegrityError) because the value 'regime_watchdog' was never added to the allowed close reasons. This created a zombie position that kept being tracked cycle after cycle.

The total damage from the delayed response: roughly $95 in P&L drawdown. Not catastrophic, but avoidable.


What We Fixed (Sprint 97)

Two changes, deployed today:

1. Prompt R:R Alignment

The root cause was a three-layer mismatch. Three different prompts all referenced 3.0:1 as the minimum R:R, while the actual code enforced 2.0:1 for momentum and 2.5:1 for trend strategies. We aligned all three:

  • Signal Generator prompt (DeepSeek): Now targets 2.5-3.0:1 but accepts 2.0:1 minimum for momentum
  • Validator prompt (Claude Sonnet): No longer rejects based on R:R ratio at all — the code has already verified the minimum. Instead, the validator evaluates whether the target price is reachable given current market conditions
  • Individual signal prompt (DeepSeek fallback): Updated weak trend guidance to match

The key design decision: "aim high, accept good." Tell the AI to generate ambitious targets (2.5-3.0:1), but let the code floor accept anything above 2.0:1. This way the AI still produces quality setups, but viable trades aren't killed by an LLM probabilistically applying a stale threshold.

2. Regime Watchdog DB Fix

Added 'regime_watchdog' to the database's allowed close reasons via a full table migration. SQLite doesn't support modifying CHECK constraints in place, so this required recreating the entire trades table — a pattern we've used before (Migration 010 did the same thing for 'opposing_signal').


What We're Monitoring

Sprint 97 just deployed. Here's what success looks like over the next 24-48 hours:

  1. Validator pass rate: Should jump dramatically. We were at ~3.3% (2 approvals out of ~60 proposals in 20 hours). Even a 15-20% pass rate would be a massive improvement.

  2. ETH trade activity: ETH has been completely locked out for 20+ hours. If the validator is working correctly, ETH should become tradeable again when conditions are favorable.

  3. Reevaluator triggers: With more opposing signals passing validation, the reevaluator should fire more frequently during regime transitions. This is the system's primary mechanism for closing positions that are no longer aligned with market direction.

  4. Zero IntegrityError on watchdog closes: The DB migration should eliminate the zombie position problem entirely.


The Bigger Picture

Seventeen days in, and the system's core thesis is holding up: an LLM-based trading pipeline can identify and execute profitable trades in crypto markets.

The numbers are good. The win rate and Sharpe ratio are solid. The architecture of multiple AI models checking each other's work — strategist for regime, signal generator for entries, validator for quality — is sound.

But the last 48 hours exposed a critical weakness: the system's ability to adapt to market reversals is only as good as its ability to pass signals through the pipeline. The AI intelligence was there. The market read was correct. The execution was blocked by a prompt that hadn't been updated.

This is the third time in Trader-7's history that an overly restrictive filter has paralyzed trading (Sprint 64, Sprint 96, Sprint 97). Each time, the failure mode was the same: safety constraints interacting to create deadlock. The lesson is clear — every filter in the pipeline needs to be audited whenever thresholds change anywhere in the system.

The outlook? Cautiously optimistic. Sprint 97 removes the last known prompt-code mismatch. The system is currently holding two SHORT positions (SOL and BTC), correctly aligned with the bear regime. If the validator is now letting the right signals through, we should see faster regime adaptation and fewer missed opportunities.

The real test comes with the next market reversal. When the regime flips again — and it will — we'll find out if the system can rotate positions in hours instead of half a day.


Quick Stats Summary

For those who just want the numbers:

  • 17 days of paper trading: +10.7% return on $3,000
  • Win rate: 44.4% (profitable at 2:1 R:R — you only need 34% to break even)
  • Sharpe ratio: 3.83 (above 2.0 is excellent, above 3.0 is exceptional — but early data, needs more time to validate)
  • Peak drawdown this week: -$95 from peak (22.9% of peak profit)
  • Trades per day: ~2.1
  • Current stance: Bear market, 2 SHORT positions open, system aligned with regime
  • Recent changes: Sprint 97 deployed — prompt alignment + watchdog DB fix
  • What improved: Win rate up 0.6pp since Feb 9, P&L up from +$200 to +$320 despite reversal
  • What to watch: Validator pass rate, ETH tradeability, regime transition speed

The system is paper trading. Past performance is not indicative of future results. This is not financial advice.


Building Trader-7 in public. Follow the journey at jamiewatters.work

Share this post