When Your AI Trader Argues With Itself (And You Have to Pick a Winner)

Day 18 of building an AI-powered crypto trading system in public

Today I learned something unexpected about AI reasoning models: they can think brilliantly and still fail to communicate their conclusions.

Why I Upgraded to Chain-of-Thought

Until yesterday, my trading system used DeepSeek's standard chat model (deepseek-chat). It worked fine - you give it market data, it returns a JSON decision:

{"action": "TRADE", "direction": "LONG", "confidence": 85}

Simple. But I had a problem: I couldn't see why it was making decisions. When the AI said "SHORT BTC at 85% confidence," I just had to trust it. Was it seeing something in the RSI? Reacting to the trend? Following the moving averages? Black box.

So I upgraded to deepseek-reasoner - DeepSeek's Chain-of-Thought model. Instead of just answering, it shows its work:

reasoning_content: "Looking at BTC-PERP, the current price of $89,575
is below both SMA20 ($90,100) and SMA50 ($91,200), indicating short-term
weakness. RSI at 43.97 is neutral - not oversold enough for a bounce play,
not overbought for a short. ADX at 22 shows moderate trend strength but
the direction is unclear. Without a clear catalyst or technical setup,
the risk-reward doesn't justify entry..."

content: {"action": "NO_TRADE", "reasoning": "Neutral conditions"}

The idea was transparency: I could audit the AI's thinking, spot flawed logic, and build confidence in its decisions.

What I got instead was a crash course in why reasoning models are fundamentally different beasts.

The JSON Parsing Nightmare

With the chat model, parsing was trivial:

response = api.call(prompt)
decision = json.loads(response.content)

With the reasoner model, everything changed. The response now has TWO fields:

reasoning_content: The Chain-of-Thought (thousands of tokens of analysis)
content: The final JSON answer

Sounds simple enough. Except:

Problem 1: The answer field is sometimes empty.

The AI would produce 6,000+ characters of brilliant analysis, conclude "conditions don't favor a trade," and then... return nothing in the content field. The reasoning was there. The conclusion was clear if you read it. But the structured JSON response? Gone.

Problem 2: The JSON is sometimes buried IN the reasoning.

Other times, the AI would write its JSON answer as part of the reasoning content, not in the designated answer field. So you'd get:

reasoning_content: "...therefore I recommend: {\"action\": \"NO_TRADE\"}"
content: ""

Problem 3: Truncation kills the answer.

The Chain-of-Thought can be 6-7K tokens. If you set max_tokens=2000 (plenty for a chat model), the reasoning fills the entire budget and the actual answer gets cut off mid-sentence.

Reason: Market conditions show RSI at 43.7 is not yet oversold (<3

That's not a complete thought. The AI ran out of tokens before finishing.

The Fix: Parse Everything

I ended up building a multi-layer extraction system:

def _extract_json_from_reasoning(self, reasoning: str) -> str:
    # Try to find JSON embedded in the reasoning
    json_patterns = [
        r'\{[^{}]*"action"\s*:\s*"[^"]+"\s*[^{}]*\}',
        r'\{[^{}]*"direction"\s*:\s*"[^"]+"\s*[^{}]*\}',
    ]
    for pattern in json_patterns:
        matches = re.findall(pattern, reasoning)
        if matches:
            # Validate it's actual JSON
            for match in reversed(matches):
                try:
                    json.loads(match)
                    return match
                except:
                    continue

    # No JSON found - infer from text
    no_trade_indicators = ["NO_TRADE", "skip this", "avoid trading"]
    if any(ind.lower() in reasoning.lower() for ind in no_trade_indicators):
        return '{"action": "NO_TRADE", "reasoning": "Inferred from CoT"}'

    return ""  # Give up

And bumped max_tokens from 2,000 to 10,000 to ensure the answer never gets truncated.

The Orchestrator Override Bug

Here's where it gets spicy. My system has three strategists in a failover chain:

DeepSeek Reasoner (primary) - Expensive, thorough
Qwen (backup) - Cheaper, faster
Rules-based (fallback) - Dumb but reliable

The intended logic:

If DeepSeek says TRADE → execute
If DeepSeek says NO_TRADE → respect it, don't trade
If DeepSeek fails (API error) → try Qwen
If both fail → use rules

But I found trades being opened when BOTH AI models said NO_TRADE. The rules-based fallback was overriding two AI systems that explicitly declined to trade.

The bug:

# DeepSeek handling (correct)
if decision == "NO_TRADE":
    return None  # Respect the decision

# Qwen handling (buggy)
if decision == "TRADE":
    return proposal
# Missing: elif decision == "NO_TRADE": return None
else:  # ABSTAIN case - but NO_TRADE falls here too!
    try_rules_fallback()

Qwen's NO_TRADE was falling through to the else branch, triggering the rules-based fallback. The fix was adding explicit handling:

elif decision == "NO_TRADE":
    print(f"Qwen says NO_TRADE - RESPECTING decision")
    return None  # Both AIs declined - don't override with rules

Lesson: Failover logic needs explicit handling for every outcome. "Else" is not your friend.

Simplifying: 7 Coins to 4

The biggest change today wasn't a bug fix - it was admitting I was overcomplicating things.

I was running in "hybrid" mode:

4 perpetual futures (BTC, ETH, SOL, XRP) with leverage
3 spot coins (ADA, LINK, AVAX) without leverage

Seven coins = seven DeepSeek API calls per cycle = 5+ minutes of AI thinking time = real money in API costs.

Looking at the data:

The perpetuals: Actionable signals, leverage, liquid markets
The spot coins: Consistently returning NO_TRADE

I was paying to analyze coins that never resulted in trades.

New config:

TRADER_MODE=PERPETUAL
PERPETUAL_INSTRUMENTS=BTC-PERP,ETH-PERP,SOL-PERP,XRP-PERP

43% fewer API calls. Focus on what works.

Current State

After today's fixes:

1 open position (SOL SHORT, small profit)
4 perpetuals being analyzed per cycle
Clean failover: AI decides, system respects it
No more phantom trades from rules override

The system correctly declined all 4 coins this cycle. In a choppy, trendless market (ADX below 15), that's exactly what a good trader should do.

Lessons for Anyone Using Reasoning Models

Response structure is different - You're getting thinking AND answer, not just answer
Budget for the thinking - CoT can be 5-10x longer than the final answer
Parse defensively - The answer might be empty, embedded, or truncated
Explicit state handling - Every decision type needs explicit code paths
Simpler is often better - More coins ≠ more profits

Tomorrow I'll see how the streamlined system performs. The AI is being conservative - hopefully that's wisdom, not missed opportunity.

Building Trader-7 in public. Currently at -$103 P&L after 43 paper trades. The goal isn't to be profitable yet - it's to build a system that can be.

Tech stack: Python, DeepSeek Reasoner, Qwen backup, Railway deployment, SQLite, Streamlit dashboard

[Follow along: @jamiewatters]

Crypto Trading Agent - upgrade to reasoning model