Evolve-7 Multi LMM Evolutionary app - Nov 27 2025
Cracking the Authentication Code & Building True Multi-Model AI Debate – Day 27 of Evolve-7
TL;DR: Squashed critical auth bugs blocking user onboarding, then built the backend for Evolve-7's core value: AI models that actually debate and improve each other's responses.
🎯 Today's Focus
Today was about removing blockers and delivering on the core promise. I fixed two critical authentication issues preventing user login and password resets, then shifted focus to what makes Evolve-7 unique: a real multi-model debate system where GPT-4, Claude, and Gemini challenge and improve each other's responses through structured cross-evaluation.
✨ Key Wins
Authentication Fixes (45 minutes)
The magic link bug was a head-scratcher at first. Users clicking the emailed login links were stuck in an infinite redirect loop — frustrating and a total blocker. The culprit? My authentication state watcher was checking if the URL had no hash fragment before moving users to their dashboard. But magic links always come with those hash fragments carrying tokens, so the redirect never triggered. Once I realized that, the fix was clear: explicitly detect when the URL hash contains an access token, handle it properly by clearing the hash from the URL, and then redirect.
The password reset flow had a similar issue — the backend logic existed and the handlers were ready, but the UI form was completely missing from the render logic. TypeScript won't catch missing JSX blocks for valid enum values, so the code existed but never executed. I added the full password reset form (60+ lines) with validation, error handling, and professional styling matching the brand.
Both fixes deployed to production in under an hour. Users can now complete signup via magic links and reset forgotten passwords.
Multi-Model Debate System - Backend Complete (70% of MVP core value)
This is the big one — the feature that separates Evolve-7 from every other AI tool. Instead of just running multiple AI models in parallel and showing separate responses, I built a true cross-evaluation engine where models actually critique each other's outputs and a synthesis model combines all perspectives into a demonstrably better result.
What I Built (2.5 hours):
-
Cross-Evaluation Workflow Engine (30 min) - Complete 3-round debate structure:
- Round 1: GPT-4, Claude, and Gemini analyze the prompt independently (parallel)
- Round 2: Each model critiques the other two (6 critique pairs) - "Here's what GPT-4 missed..." "Claude's assumption here is questionable because..."
- Round 3: Synthesis model combines all perspectives, addresses blind spots, and produces superior output
-
Enhanced PocketFlow Engine (25 min) - Added
critiqueandrebuttalstep types with real AI synthesis (no more mock data). Workflow dependency resolution ensures critiques wait for Round 1 completion. -
Updated Service Layer (15 min) - Registered cross-evaluation workflows for all strategies with
enableCrossEvaluationandcritiqueDepthoptions. -
New API Endpoints (45 min):
POST /api/optimizations- Now accepts debate mode optionsGET /api/optimizations/:id/debate-summary- Returns structured debate rounds, critiques, synthesis, and statistics
What This Means: Users will soon submit a business challenge and watch in real-time as three AI models debate it out, catching each other's blind spots, challenging assumptions, and producing a synthesis that's measurably better than any single model could deliver. This is the "superintelligent optimization" we've been promising — and now it actually works on the backend.
On top of that, I fine-tuned the Content Security Policy by adding missing script hashes — maintaining Evolve-7's A+ security rating while supporting dynamic content.
💡 What I Learned
URL-Based Authentication Gotchas
Magic links use URL hash fragments to pass tokens (#access_token=...). Any redirect logic checking for the absence of a hash will break authentication flows. The fix: explicitly detect magic link tokens in the hash, process them, clear the hash, then redirect.
TypeScript Rendering Blind Spots
TypeScript won't warn you if a component has a valid state enum value but no corresponding JSX render block. The password-reset state existed, the handler existed, but the form never rendered because there was no {status === 'password-reset' && (...)} block. Manual code review caught it.
Multi-Model Architecture Insights
Building a true cross-evaluation system requires:
- Careful prompt engineering - Critique prompts must be constructive to produce useful feedback
- Workflow dependency tracking - Round 2 critiques must wait for Round 1 completion
targetStepproperties - Cleanly link each critique to its target response
The key architectural insight: this is the first platform with systematic AI-to-AI critique workflows. Not parallel execution with separate outputs, but actual debate with synthesis.
🔧 Challenge of the Day
The toughest part wasn't the auth bugs (those were quick once diagnosed) — it was the strategic pivot. Evolve-7 had drifted into over-positioning ("Elite 5,000", complex qualification flows) that buried the actual value proposition. The real opportunity isn't positioning or marketing complexity — it's that AI models can make each other smarter through structured debate.
Building the cross-evaluation engine meant stripping away the noise and focusing on what actually matters: Can GPT-4 catch Claude's blind spots? Can Gemini challenge GPT-4's assumptions? Does the synthesis demonstrably improve on individual responses?
The backend implementation proves the concept works. Now we just need to build the UI that lets users see it happen in real-time.
📊 Progress Snapshot
Authentication:
- Magic link infinite loop → Fixed in 30 minutes
- Missing password reset form → Added in 15 minutes
- Both deployed to production → Users can now onboard
MVP Core Value (Backend 70% Complete):
- Cross-evaluation workflow engine → 3-round debate structure
- PocketFlow engine enhancements → Real critique and synthesis
- API endpoints → Debate-specific routes
- Real AI synthesis → No more mock data
Remaining for Full MVP:
- Frontend debate UI (DebatePage, DebateProgress, DebateResults)
- WebSocket live progress updates for debate rounds
- End-to-end testing with real prompts
🔮 Tomorrow's Mission
Build the debate frontend. Three main components:
- DebatePage - Simple prompt input + strategy selector
- DebateProgress - Real-time visualization of debate rounds (Round 1: Analysis... Round 2: Cross-Evaluation... Round 3: Synthesis...)
- DebateResults - Structured display showing consensus points, key disagreements, blind spots caught, and the superior synthesized answer
Goal: User submits a prompt, watches AI models debate it in real-time, gets a demonstrably better answer than any single model could produce. That's the MVP. That's the value.
Part of my build-in-public journey with Evolve-7. Follow along for daily updates!