Skip to main content

Sprints 51-60: Building Trader-7 v6.1 - From Vision to Reality

Published: December 29, 202510 min read
#Crypto#Agent#Progress#Agent-11#BuildInPublic#Solopreneur#ClaudeCode

Sprints 51-60: Building Trader-7 v6.1 - From Vision to Reality

December 29, 2025 | AI Trading Systems · Multi-Agent Architecture · Build in Public


The Journey So Far

Ten sprints. Three days. One mission: transform Trader-7 from a single-model trading system into a sophisticated multi-agent architecture that learns, adapts, and collaborates.

Plot twist: Claude originally estimated 68-82 hours of implementation. Using AGENT-11 multi-agent orchestration, we delivered all 10 sprints in 3 days while I was on holiday - less than one full work day of actual effort. That's the power of agentic development.

If you're building with Claude Code, seriously - try AGENT-11. It's free. And if it saves you time like it did for me, buy me a coffee.

On December 25th, I shared the vision for v6.0 - a 5-agent architecture with specialized models for strategy, signals, risk, and sentiment. Today, I'm reflecting on what we actually built, what we learned along the way, and what time will tell us next.

Spoiler: We delivered v6.1, and the system is now live.


What We Delivered: Sprint by Sprint

Sprint 51: OpenRouter Integration

The Foundation

Every multi-agent system needs a unified API gateway. We chose OpenRouter because:

  • Single API key for Claude, DeepSeek, Gemini, and more
  • Automatic fallbacks when models are unavailable
  • Usage tracking across all providers
  • Cost optimization through routing

Key Decision: Abstract the model layer completely. Now switching from Claude Opus 4.5 to a future model is a config change, not a rewrite.


Sprint 52: News & Sentiment Pipeline

Adding Market Awareness

Trader-7 v5.0 was blind to the world. It only saw price and volume. v6.0 needed to understand context.

Built:

  • News aggregation from multiple crypto sources
  • Sentiment scoring with Gemini 3 Flash (180ms latency)
  • Narrative detection (is the market bullish, bearish, or neutral?)
  • 6-hour sentiment refresh cycle

Future Consideration: Grok's native X access remains compelling, but we're deferring it until the core v6.1 changes are proven. Once we validate the multi-agent architecture delivers results, we'll consider adding Grok for enhanced social sentiment. One variable at a time.


Sprint 53: Strategy/Signal Separation

Macro Alignment

This was the architectural heart of v6.0.

Before: One model doing everything (strategy AND signals) - reacting to moment-by-moment data without broader context After: Two specialized agents where signals are informed by macro perspective

The Strategist (Claude Opus 4.5):

  • Analyzes market regime (trending, ranging, volatile)
  • Provides macro context for decision-making
  • Decides which assets to focus on and why
  • Sets risk appetite based on overall market conditions
  • Runs every 4 hours

The Signal Generator (DeepSeek V3.2):

  • Technical indicator analysis (RSI, ADX, Bollinger Bands)
  • Entry/exit point identification
  • Critically: Signals are filtered through strategist's macro view
  • Cost-effective at high frequency

Why This Matters: The real improvement isn't just specialization - it's that every trade proposal is now aligned with the broader market context. Instead of chasing momentary signals in isolation, the system asks "does this signal make sense given where the market is right now?" Better proposals because they're informed by the bigger picture.


Sprint 54-55: Risk Manager with Claude

The Guardian

This was the most complex sprint - and the most important.

Built:

  • Claude Opus 4.5 as real-time risk manager
  • Position monitoring every 5 minutes
  • Math verification layer (never trust LLM calculations)
  • Emergency exit protocols

Critical Insight: We originally planned GPT-5.1 for risk management. After testing, Claude Opus 4.5 proved superior for nuanced risk assessment. It understands context better and makes fewer catastrophic errors.

The Math Verification Layer: LLMs are terrible at math. Position sizing, stop loss calculations, take profit levels - all verified programmatically. This isn't optional; it's a hard requirement.


Sprint 56: Agentic Reflection

Learning from Experience

Static systems don't survive changing markets. v6.0 needed to learn.

The Daily Reflection Loop (18:00 UTC):

  1. Collect all trade outcomes from the past 24 hours
  2. Compare actual results vs predictions
  3. Identify patterns in successes and failures
  4. Update strategist prompt with lessons learned
  5. Archive insights for future reference

Impact: The system now improves over time. Bad patterns get corrected. Good patterns get reinforced.


Sprint 57: Full System Testing

Trust but Verify

Before going live, we needed validation.

Built:

  • Comprehensive test suite (all components)
  • Baseline performance measurement
  • Ablation testing (disable components to measure impact)
  • Learning curve analysis
  • Cost projections ($26-43/month for full system)

Result: 90%+ test coverage. Every component verified. Ready for production.


Sprint 58: Verbal Feedback System

Research-Backed Improvement

This sprint was inspired by academic research (arXiv:2510.08068) showing that verbal feedback improves LLM performance by +31% without retraining.

Built:

  • Lessons Manager with 20 active lessons cap
  • Daily critique generation
  • Weekly synthesis reports
  • Prioritized lesson injection into prompts

How It Works:

  1. System observes trade outcomes
  2. Generates verbal critiques ("That ETH long ignored the funding rate signal")
  3. Distills critiques into lessons
  4. Injects top lessons into future prompts
  5. Caps at 20 lessons to prevent prompt bloat

The Science: Verbal feedback activates different reasoning pathways than raw data. The model doesn't just see "Trade failed" - it understands WHY.


Sprint 59: Agent Collaboration Protocols

From Individuals to Team

Multiple agents mean nothing if they can't collaborate effectively.

Built:

  • Confidence scoring (0.0-1.0) for all agents
  • Inter-agent queries via CollaborationProtocol
  • ConsensusEngine with weighted voting
  • Position sizing based on consensus confidence

How Consensus Works:

  1. Strategist proposes trade direction
  2. Signal Generator confirms entry conditions
  3. Risk Manager validates position sizing
  4. Sentiment Agent provides market context
  5. Weighted vote determines final decision

Key Innovation: Position sizing scales with consensus confidence. High agreement = larger position. Disagreement = smaller position or no trade.

Implementation:

# Confidence-weighted consensus
strategist_weight = 0.35
signal_weight = 0.25
risk_weight = 0.30
sentiment_weight = 0.10

final_confidence = sum([
    strategist.confidence * strategist_weight,
    signal.confidence * signal_weight,
    risk.confidence * risk_weight,
    sentiment.confidence * sentiment_weight
])

position_size = base_size * final_confidence

Sprint 60: On-Chain Data Pipeline

Building for the Future

On-chain data provides alpha that technicals and sentiment can't capture:

  • Whale wallet accumulation/distribution
  • Funding rate extremes
  • Open interest changes
  • Liquidation cascades

Built:

  • Glassnode fetcher (exchange netflow, MVRV)
  • CoinGlass fetcher (funding rates, OI, long/short ratios)
  • OnChainAggregator for combined analysis
  • Database persistence for historical tracking
  • Dashboard visualization
  • 63 tests passing

Reality Check: We built the infrastructure, but discovered:

  • Glassnode requires $999/month for API access
  • CoinGlass free tier doesn't include needed endpoints
  • Coinbase uses dated futures (2030 expiry), not true perpetuals

Decision: Infrastructure is ready and tested. Activation deferred until ROI justifies cost. System gracefully falls back to neutral defaults.


Lessons Learned

1. Model Specialization Beats Generalization

No single model excels at everything. Claude for strategic thinking. DeepSeek for rapid signals. Gemini for fast sentiment. Use each model where it shines.

2. Never Trust LLM Math

This cannot be overstated. LLMs hallucinate numbers. Every calculation affecting real money must be verified programmatically.

3. Consensus Reduces Risk

LLMs are non-deterministic. The same prompt can produce different answers. Multi-agent consensus smooths out individual model errors.

4. Research Pays Off

The verbal feedback system (+31% improvement) came from reading academic papers. Time spent on research directly improved the system.

5. Build Infrastructure Before You Need It

Sprint 60's on-chain pipeline isn't active yet, but when we're ready, it's there. Building infrastructure during quiet periods prevents rushed implementations later.

6. Graceful Degradation is Essential

When Coinbase didn't have traditional funding rates, the system didn't crash. It fell back to neutral defaults. Every external dependency needs a fallback.

7. Cost Projections Change

Initial v6.0 estimate: $210/month Final v6.1 reality: $26-43/month

Why? Grok sentiment was less essential than expected. DeepSeek is cheaper than projected. Claude Opus is reserved for critical decisions only.


Expected Performance Benefits

Based on component testing and research:

Improvement Area Expected Impact Mechanism
Strategy/Signal Separation +3-5% win rate Macro-aligned proposals vs reactive signals
Verbal Feedback System +31% decision quality Research-backed prompt optimization
Consensus Voting +12% accuracy Reduces individual model errors
Risk Manager -10% drawdown Real-time position monitoring
Sentiment Integration +0.51 correlation Market context awareness

Combined Expected Impact: v6.1 should outperform v5.0 by 6-10 percentage points on win rate, with significantly lower drawdowns.


What Time Will Tell

We've built the infrastructure. We've run the tests. Now comes the real validation: live trading.

What We're Watching:

  1. Does multi-agent consensus actually improve win rate?
  2. Does verbal feedback accumulate into measurable improvement over 30/60/90 days?
  3. Does the reflection loop catch and correct bad patterns?
  4. Are the cost projections accurate in production?
  5. When does on-chain data justify its cost?

Timeline:

  • Week 1-2: Baseline establishment
  • Week 3-4: Early pattern recognition
  • Week 5-8: Trend confirmation
  • Week 9-12: Statistical significance

Commitment: If data shows v6.1 underperforming v5.0 after 60 days, we'll analyze, adapt, or roll back. No ego. Just data.


Current State: v6.1 Live

As of December 29, 2025:

  • Status: HEALTHY
  • Open Positions: 0
  • Recent Win Rate: 75% (last 4 trades)
  • Market Narrative: NEUTRAL
  • All Systems: Operational

The infrastructure is solid. The agents are collaborating. The reflection loop is running.

Now we wait, watch, and learn.


What's Next

Immediate:

  • Continue live monitoring
  • Collect performance data
  • Document any anomalies

Short-term (30 days):

  • First statistical checkpoint
  • Refine verbal feedback lessons
  • Optimize consensus weights if needed

Medium-term (60-90 days):

  • Full performance analysis
  • On-chain data activation decision
  • v6.2 planning based on learnings

Building in Public

Sprints 51-60 Total:

  • 10 sprints delivered
  • 3 days elapsed (while on holiday)
  • Less than 1 work day of actual effort
  • Original estimate: 68-82 hours
  • AGENT-11 efficiency: ~10x faster than manual development
  • 63 on-chain tests + full system coverage
  • 4 new agents integrated
  • 1 major architecture migration

Documentation Created:

  • 10 detailed sprint documents
  • Updated architecture.md
  • Updated product-description.md
  • Complete test suites

This is what building a trading system looks like. Not overnight. Not magic. Systematic iteration, research-backed decisions, and relentless testing.


The system is live. The data is collecting. Time will tell.


Building Trader-7 in public. One sprint at a time.

Follow the build: jamiewatters.work/progress


Twitter/X Post

Building Trader-7 v6.1 in public.

10 sprints shipped in 3 days (while on holiday). Original estimate: 68-82 hours. Actual effort: <1 work day.

Secret weapon: AGENT-11 orchestration github.com/TheWayWithin/agent-11

What we built:

  • Macro-aligned strategy/signal separation
  • Verbal feedback system (+31% research-backed)
  • Confidence-weighted consensus
  • On-chain data pipeline (63 tests)

System is live. Time will validate.

Try AGENT-11. It's free. If it saves you time, buy me a coffee: buymeacoffee.com/jamiewatters

#BuildInPublic #AITrading #AgenticDevelopment #ClaudeCode


LinkedIn Post

10 Sprints in 3 Days: The Power of Agentic Development

I just shipped 10 sprints of AI trading system development in 3 days - while on holiday. Original estimate: 68-82 hours. Actual effort: less than one work day.

How? AGENT-11 multi-agent orchestration: https://github.com/TheWayWithin/agent-11

What We Built (Trader-7 v6.1):

  • Strategy/Signal separation for macro-aligned trade proposals
  • Verbal feedback system (based on research showing +31% improvement)
  • Confidence-weighted consensus across agents
  • On-chain data infrastructure (63 tests passing)
  • Claude Opus 4.5 as risk manager

The Key Insight: Strategy/Signal separation isn't just about specialization. It's about ensuring every trade proposal is aligned with the broader market context. Instead of reacting to momentary signals in isolation, the system now asks "does this signal make sense given where the market is right now?"

Expected vs. Reality: Initial cost estimate: $210/month Final implementation: $26-43/month

Thoughtful architecture decisions matter.

What's Next: The system is live. We'll collect data, analyze patterns, and let time validate our hypothesis. Some features (like Grok for X sentiment) are ready to add once we prove the core architecture works.

No ego - just data-driven iteration.

If you're building with Claude Code, try AGENT-11. It's free, and if it saves you time like it did for me, buy me a coffee: https://buymeacoffee.com/jamiewatters

#BuildInPublic #AI #AgenticDevelopment #Trading #ClaudeCode

Share this post