Phase 3 Complete: The Recommendation Engine Is Live
Phase 3 Complete: The Recommendation Engine Is Live
The core value is here. ModelOptix now delivers personalized AI model recommendations that save you money.
TL;DR
- ✅ Recommendation Engine live - FitScore algorithm finds cheaper alternatives
- ✅ Opportunities dashboard - See all savings opportunities ranked by ROI
- ✅ Sanity Check - Test models side-by-side before switching
- ✅ Guest access - Try it free before signup (no account needed)
- ✅ Trust Dashboard - Provider transparency scores
- ✅ One-click switches - Accept opportunity, track savings
- 🎯 The aha moment delivered - Time to first savings: < 10 minutes
- 🚀 Try it now - 3 free model comparisons, no signup
What I Shipped: The Value Delivery System
Phase 3 was the big one. This is where ModelOptix stops being a catalog and starts saving you money.
What is ModelOptix? An independent AI model advisor that finds cheaper alternatives to your current models, with zero bias. No affiliate commissions. No vendor partnerships. Just you vs. the market.
The Recommendation Engine (FitScore)
Every night at 5am UTC, ModelOptix scans your AI portfolio and compares your current models to 400+ alternatives using our FitScore algorithm.
The FitScore calculates:
- Cost efficiency (35% weight) - Price per task vs. your budget
- Speed match (25% weight) - Response time vs. your latency needs
- Quality fit (20% weight) - Benchmark scores vs. your requirements
- Trust score (15% weight) - Provider reliability + editorial warnings
- Context compatibility (5% weight) - Token window vs. your prompts
When an alternative scores >10% better than your current model → Opportunity created.
Example opportunity:
Use Case: "Sentiment Analysis for Trading Signals"
Current Model: GPT-4 Turbo ($10/1M input, $30/1M output)
Recommended: Llama 3.1 70B ($0.35/1M input, $0.40/1M output)
FitScore Improvement: +42%
Estimated Monthly Savings: $280 (89% cost reduction)
Quality Score: Equivalent (93.2 vs 93.5)
Speed: 15% faster average latency
Recommendation: Strong switch candidate
Why this matters: Most teams pick a model in March and never re-evaluate. Six months later, they're overpaying because cheaper models have gotten better. ModelOptix watches the market for you 24/7.
The Opportunities Dashboard
Your mission control for savings.
What you see:
- All opportunities ranked by estimated monthly savings
- Filter by: Status (active/accepted/dismissed), use case, opportunity type
- Sort by: Improvement %, monthly savings, date created
- Each card shows:
- Model comparison (current → recommended)
- FitScore improvement percentage
- Estimated monthly savings in dollars
- Recommendation reason (cost, speed, quality, trust)
- One-click "Accept" or "Dismiss" actions
The workflow:
- Log in → See opportunities waiting for you
- Click opportunity → View detailed comparison
- Run Sanity Check → Test with your real prompts
- Accept → ModelOptix switches your model, tracks savings
- Repeat → Keep finding savings as new models launch
Time to first opportunity: < 10 minutes from signup. Add your product, define one use case, wait for the nightly job. Wake up to savings.
Sanity Check: Test Before You Switch
This is the killer feature. Never blindly switch models again.
How it works:
- Click "Run Sanity Check" from any opportunity
- Enter 1-3 sample prompts representative of your use case
- ModelOptix runs both models via OpenRouter API
- See side-by-side results with:
- Full model responses
- Response time comparison
- Token usage
- Cost per request
- Quality assessment
You decide:
- ✅ Recommended model is better → Accept opportunity, save money
- ✅ Current model is better → Dismiss opportunity, keep current
- ✅ It's a tie → Your call based on cost vs. minor quality differences
- ✅ Neither is good → Flag for investigation
Example Sanity Check result:
| Metric | GPT-4 Turbo (Current) | Claude 3.5 Sonnet (Recommended) |
|---|---|---|
| Response Time | 2.4s | 1.8s ⚡ 25% faster |
| Tokens Used | 450 | 420 |
| Cost | $0.0135 | $0.00126 💰 91% cheaper |
| Quality | Excellent summary | Excellent summary, slightly more concise |
Your evaluation: "Recommended model is 25% faster, 91% cheaper, and equally good. Easy accept."
No risk switching. Test first, decide with data.
Guest Sanity Check (No Account Needed)
Want to try ModelOptix before signing up? We got you.
Visit modeloptix.com/try:
- Select any two models from our catalog
- Enter your test prompt
- See side-by-side results
- Submit your preference
Free tier: 3 comparisons before signup. Enough to see the value.
Why we offer this: Traditional SaaS hides the product behind signups. We want you to experience the "aha moment" immediately. Try it, see the difference, then decide if you want ongoing monitoring.
Example guest use case:
- Testing GPT-4o vs Claude 3.5 Sonnet for customer support responses
- Comparing Llama 3.3 70B vs Mistral Large for code generation
- Evaluating Gemini 1.5 Flash vs GPT-4o mini for classification
No credit card. No email. Just compare.
Trust Dashboard
We show our work. Every provider gets a trust score based on 6 dimensions:
Trust Tiers:
- Tier A (Score 1.0) - OpenAI, Anthropic, Google - Top trust
- Tier B (Score 0.7) - Mistral, Cohere - Good trust, minor gaps
- Tier C (Score 0.4) - Smaller providers - Use with caution
- Unknown (Score 0.2) - Insufficient data
6 Trust Dimensions:
- Data Handling - Privacy policy, data retention, GDPR compliance
- Transparency - Model cards, benchmark disclosure, pricing clarity
- Security - SOC 2, ISO 27001, penetration testing
- Compliance - Industry certifications, audit trails
- Reliability - Uptime SLA, incident response, status page
- Ethics - Bias testing, content policy, responsible AI practices
Why this matters: Cheapest isn't always best. If you're processing PII or financial data, Tier A providers might be worth the premium. If you're generating blog summaries, Tier C is fine.
Editorial overrides: When we discover issues (e.g., provider security incident), we flag or downrank models until resolved. You see warnings before accepting opportunities.
One-Click Model Switches
Found a good opportunity? Accepting is effortless.
The "Accept" flow:
- Click "Accept Opportunity"
- ModelOptix updates your use case to the recommended model
- Tracks estimated monthly savings to your profile
- Expires other opportunities for that use case (no duplicates)
- Shows success toast: "Switched to Claude 3.5 Sonnet. Estimated savings: $156/month"
Savings tracking:
- Per use case: "This switch saves $156/month"
- Portfolio-wide: "Total monthly savings: $487 across 6 switches"
- Cumulative: "You've saved $2,920 since joining ModelOptix 6 months ago"
Your dashboard shows:
- Products monitored
- Active opportunities
- Accepted switches
- Total savings (the big number)
ROI calculation: If ModelOptix saves you $200/month and costs $9.95/month (Solo tier), you're netting $190/month profit. That's 19x return on investment.
Real-World Example: Trading Platform Case Study
Let's walk through a concrete example.
Scenario: You run TradingAI, a platform with 5 AI use cases:
| Use Case | Current Model | Monthly Cost |
|---|---|---|
| Market sentiment analysis | GPT-4 Turbo | $340 |
| Trading signal generation | Claude 3 Opus | $420 |
| Risk assessment | GPT-4o | $185 |
| News summarization | GPT-4 Turbo | $290 |
| Customer support chat | Claude 3.5 Sonnet | $95 |
| Total | $1,330/month |
Day 1: You add TradingAI to ModelOptix, define all 5 use cases.
Day 2 (after nightly scan): You wake up to 4 opportunities:
Opportunity 1: Market Sentiment Analysis
- Current: GPT-4 Turbo ($340/month)
- Recommended: Llama 3.1 70B ($45/month)
- Improvement: +38% FitScore
- Savings: $295/month (87% reduction)
- Reason: Classification tasks don't need GPT-4's reasoning depth
Opportunity 2: Trading Signal Generation
- Current: Claude 3 Opus ($420/month)
- Recommended: Claude 3.5 Sonnet ($165/month)
- Improvement: +29% FitScore
- Savings: $255/month (61% reduction)
- Reason: Newer Sonnet model matches Opus quality at lower price
Opportunity 3: Risk Assessment
- Current: GPT-4o ($185/month)
- Recommended: Mistral Large ($78/month)
- Improvement: +31% FitScore
- Savings: $107/month (58% reduction)
- Reason: Mistral Large matches GPT-4o for structured analysis
Opportunity 4: News Summarization
- Current: GPT-4 Turbo ($290/month)
- Recommended: GPT-4o mini ($32/month)
- Improvement: +45% FitScore
- Savings: $258/month (89% reduction)
- Reason: Summarization is a simple task, mini model excels
Customer Support Chat: No opportunity (already using optimal model)
Total Potential Savings: $915/month (69% cost reduction)
Your next steps:
- Run Sanity Check on Opportunity 1 (sentiment analysis)
- Test with 3 real trading scenarios
- Results show Llama 3.1 70B is 92% as accurate, 4x faster
- Accept → Save $295/month
- Repeat for other opportunities
Month 1 savings after all switches: $915/month ModelOptix cost (Solo tier): $9.95/month Net savings: $896/month Annual net savings: $10,752
That's what ModelOptix delivers.
Technical Deep Dive (For The Builders)
FitScore Algorithm Architecture
The core recommendation engine uses a weighted scoring system:
// Five factor scores, each normalized 0-1
const factors = {
cost: normalizeCost(model.pricing, userBudget),
speed: normalizeSpeed(model.latency, userRequirement),
quality: normalizeBenchmark(model.score, userMinimum),
trust: normalizeTrust(provider.tier), // A=1.0, B=0.7, C=0.4
context: normalizeContext(model.maxTokens, userNeeds)
};
// User priorities → weights (must sum to 100%)
const weights = calculateWeights(userPriorities);
// Examples:
// "cost" priority → {cost: 55%, speed: 22%, quality: 12%, trust: 7%, context: 4%}
// "quality" priority → {quality: 55%, cost: 22%, speed: 12%, trust: 7%, context: 4%}
// Weighted sum
const fitScore =
(factors.cost * weights.cost) +
(factors.speed * weights.speed) +
(factors.quality * weights.quality) +
(factors.trust * weights.trust) +
(factors.context * weights.context);
// Editorial overrides
if (editorialExclude) fitScore = 0; // Never recommend
if (editorialDownrank) fitScore *= 0.5; // 50% penalty
if (editorialFlag) showWarning(); // UI flag, no score penalty
Normalization ranges are cached per factor (5-min TTL) to avoid recalculating min/max across 400+ models on every comparison.
Why weighted scores? Different use cases have different priorities. Real-time chat needs speed > cost. Batch processing needs cost > speed. Quality-critical tasks need quality > everything. Users can set priorities per use case.
Opportunity Generation Job
Runs daily at 5am UTC (after model/pricing sync at 2-4am):
// Pseudocode
for (const useCase of activeUseCases) {
const currentModel = useCase.current_model;
const alternatives = getCompatibleModels(useCase.requirements);
for (const altModel of alternatives) {
const currentScore = calculateFitScore(currentModel, useCase);
const altScore = calculateFitScore(altModel, useCase);
const improvement = (altScore - currentScore) / currentScore;
if (improvement > 0.10) { // 10% threshold
createOpportunity({
use_case_id,
current_model_id,
recommended_model_id,
improvement_percentage: improvement,
estimated_monthly_savings: calculateSavings(useCase.volume),
evidence: { currentScore, altScore, factors, weights }
});
}
}
}
Prevents duplicates: Expires old opportunities for a use case when new ones are created.
Performance: Processes 1,000 use cases in ~45 seconds (PostgreSQL + pgvector).
Sanity Check OpenRouter Integration
Side-by-side testing via OpenRouter's unified API:
// Run both models in parallel for fairness
const [currentResult, recommendedResult] = await Promise.all([
openrouter.chatCompletion({
model: currentModel.provider_model_id,
messages: [{ role: 'user', content: userPrompt }],
max_tokens: userSettings.maxTokens || 500,
temperature: userSettings.temperature || 0.7
}),
openrouter.chatCompletion({
model: recommendedModel.provider_model_id,
messages: [{ role: 'user', content: userPrompt }],
max_tokens: userSettings.maxTokens || 500,
temperature: userSettings.temperature || 0.7
})
]);
// Store results with latency tracking
await saveSanityCheck({
current_response: currentResult.content,
current_latency_ms: currentResult.latency,
current_tokens: currentResult.usage.total_tokens,
current_cost: calculateCost(currentResult, currentModel.pricing),
recommended_response: recommendedResult.content,
recommended_latency_ms: recommendedResult.latency,
recommended_tokens: recommendedResult.usage.total_tokens,
recommended_cost: calculateCost(recommendedResult, recommendedModel.pricing)
});
Cost tracking: OpenRouter returns exact token counts. We calculate costs using our pricing database (synced weekly).
Guest session handling:
// Guest token: SHA-256 hash stored in DB
const guestToken = generateGuestToken(); // UUID
const hashedToken = sha256(guestToken);
// Store unhashed in localStorage (client)
localStorage.setItem('guest_session_id', guestToken);
// Store hashed in database (server)
await createGuestSession({ guest_session_id: hashedToken });
// Validate on API requests
const submittedHash = sha256(request.headers['x-guest-token']);
const session = await getGuestSession(submittedHash);
if (!session || session.expires_at < now) throw new Error('Invalid session');
7-day TTL, auto-cleanup via cron job.
Trust Score System
Provider trust tiers are manually assigned based on due diligence:
-- providers table
tier: 'A' | 'B' | 'C' | 'unknown'
-- Example assignments
UPDATE providers SET tier = 'A' WHERE name IN ('OpenAI', 'Anthropic', 'Google');
UPDATE providers SET tier = 'B' WHERE name IN ('Mistral AI', 'Cohere');
UPDATE providers SET tier = 'C' WHERE name IN ('Together AI', 'Fireworks AI');
Trust score in FitScore:
- Tier A: 1.0 (no penalty)
- Tier B: 0.7 (30% trust penalty)
- Tier C: 0.4 (60% trust penalty)
- Unknown: 0.2 (80% trust penalty)
With default weight (7%), Tier C models need to be 4-5% better on other factors to overcome the trust penalty.
Editorial overrides:
CREATE TABLE editorial_overrides (
model_id UUID REFERENCES models(id),
override_type: 'exclude' | 'downrank' | 'flag',
reason TEXT,
created_by UUID -- admin user
);
- Exclude: Never recommend (e.g., model deprecated, security issue)
- Downrank: 50% FitScore penalty (e.g., known quality regression)
- Flag: UI warning, no score penalty (e.g., beta model, use with caution)
Why Phase 3 Matters: The Independent Advisor You Need
The Problem We're Solving
You're building a product. You added AI features. You picked models based on what was best 6 months ago.
What changed since then:
- Llama 3.3 70B launched (matches GPT-4 Turbo for $0.20/1M vs $10/1M)
- Claude 3.5 Sonnet got a price drop (now cheaper than Claude 3 Opus)
- Gemini 1.5 Flash improved benchmarks (now viable for complex tasks)
- GPT-4o mini was released (1/50th the cost of GPT-4 for simple tasks)
You didn't notice because:
- You're busy shipping features, not monitoring model releases
- 400+ models across 15+ providers - impossible to track manually
- Benchmarking alternatives takes days
- Fear of switching (what if quality drops?)
Result: You're overpaying. Not because you made a bad choice. Because the market moved and you didn't.
The ModelOptix Solution
We monitor the market for you:
- Daily catalog sync (new models, price drops)
- Weekly benchmark updates (quality improvements)
- Nightly portfolio scans (find opportunities)
- Proactive alerts (better model launched for your use case)
We eliminate risk:
- Sanity Check lets you test before switching
- FitScore algorithm considers quality, not just cost
- Trust scores flag risky providers
- Editorial overrides warn about issues
We track your savings:
- Per-switch savings estimates
- Portfolio-wide monthly savings
- Cumulative savings since joining
- ROI dashboard (savings vs subscription cost)
We're independent:
- No affiliate commissions from providers
- No vendor partnerships or sponsorships
- Revenue from subscriptions only
- Our incentive = your savings
You're not our product. Your savings are our product.
What's Next: Phase 4 (Monetization)
The foundation is complete. The value is proven. Now we need to get paid.
Phase 4 deliverables (Target: 5 days):
- Stripe Integration - Full subscription flow
- Pricing Tiers - Free, Solo ($9.95/mo), Growth ($19.95/mo), Pro ($29.95/mo)
- Trial Flow - 7-day free trial with card upfront
- Customer Portal - Manage subscription, billing, invoices
- Tier Limits - Enforce product/use case limits per tier
- Webhook Handlers - Payment failures, subscription changes
- Onboarding Funnel - Landing → Trial → Tier selection → Checkout → Dashboard
The business model:
| Tier | Price | Products | Use Cases | Sanity Checks | Target User |
|---|---|---|---|---|---|
| Free | $0 | 1 | 3 | 0/month | Post-trial downgrade |
| Solo | $9.95/mo | 3 | - | 10/month | Solo founder, side project |
| Growth | $19.95/mo | 10 | - | 30/month | Multi-product solopreneur |
| Pro | $29.95/mo | 25 | - | 100/month | Power user, agency |
Value prop: If Solo tier saves you $200/month, you're paying $9.95 to save $200. That's 20x ROI every month.
Beta launch: 7 days after Phase 4 complete. Beta users get lifetime Solo tier free (150+ on waitlist).
Try ModelOptix Today
The product is live. The value is real. Try it now.
Guest Access (No Signup)
- Compare any 2 models side-by-side
- Test with your real prompts
- See speed and cost differences
- 3 free comparisons
Perfect for:
- "I'm curious how Claude 3.5 compares to GPT-4o"
- "Should I switch from GPT-4 Turbo to Llama 3.3 70B?"
- "Is Gemini 1.5 Flash good enough for my use case?"
Waitlist (Beta Access)
- Join builders waiting for beta
- Get Solo tier ($9.95)
- Early access to all features
- Direct line to me for feedback
Beta launching: ~7 days (after Phase 4 monetization)
Try The Savings Calculator
- Enter your monthly API volume
- Select your primary use case
- Choose your current model
- See estimated savings with ModelOptix
Trader7 case study: $747/month saved (40% cost reduction) while maintaining 96.2% quality.
For Other Builders
Lessons from Phase 3:
1. The Value Moment Must Be Instant
We spent 2 weeks building infrastructure. Users don't care.
Phase 3 delivered the "aha moment" - see a savings opportunity in < 10 minutes. That's what converts.
Build the value delivery first, then the infrastructure to scale it.
2. Remove Friction Ruthlessly
Guest Sanity Check has zero friction:
- No signup
- No email
- No credit card
- Just "pick two models, enter prompt, see results"
Conversion to paid will be higher because users experience value before committing.
Let users try before they buy. It's 2026, signup walls are dead.
3. Show Your Work (Trust Through Transparency)
We expose:
- FitScore calculation methodology
- Provider trust tier assignments
- Editorial override reasons
- Savings calculations (not rounded estimates)
Transparency builds trust. Trust converts.
4. Async Jobs Are Your Friend
The opportunity generation job runs once daily. Users don't see it, but it's the workhorse of the product.
Don't make users wait. Pre-compute expensive operations, serve cached results.
5. Weighted Algorithms > Binary Rules
FitScore doesn't say "this model is better." It says "this model scores 87/100 for YOUR use case with YOUR priorities."
Personalization converts better than generic recommendations.
The Build-in-Public Numbers
Phase 3 Stats:
- Duration: 2 days (Jan 21-22)
- Tasks completed: 12/15 (2 P1 deferred, 1 P1 dropped)
- Code added: ~4,500 lines across 35 files
- API endpoints: 8 (opportunities, sanity checks, trust, public models)
- UI pages: 6 (opportunities list/detail, sanity check, trust dashboard, guest flow)
- Cron jobs: 1 (opportunity generation, daily 5am UTC)
Cumulative Progress:
- Phases complete: 4/6 (Phase 0, 0.5, 1, 2, 3)
- Tasks complete: 45/72 (63%)
- Lines of code: ~13,000
- Waitlist signups: 150+
- Beta launch: 7 days (Phase 4 complete)
Phase 4 Target:
- Start: Jan 22
- Complete: Jan 27 (5 days)
- Tasks: 11 (Stripe subscriptions, billing, trial flow)
- Goal: Paying customers by end of month
Join the Beta
Beta perks:
- ✅ Solo tier ($9.95/pm)
- ✅ Early access to all features
- ✅ Shape the product with your feedback
- ✅ Direct line to the founder
How to join:
- Try guest flow → modeloptix.com/try (3 free comparisons)
- Like what you see? → Join waitlist (150+ builders ahead of you)
- Get beta invite → Email when Phase 4 launches (7 days)
- Sign up → Lifetime free Solo tier activated
- Start saving → Add your products, get recommendations
Or just try the calculator: modeloptix.com
Let's Connect
Try the product:
- Guest comparison: modeloptix.com/try
- Savings calculator: modeloptix.com
- Join waitlist: modeloptix.com
Connect with me:
- X/Twitter: @Jamie_within
- LinkedIn: Jamie Watters
- Blog: jamiewatters.work
Follow the build:
- Twitter/X: @Jamie_within - Daily updates
- LinkedIn: Jamie Watters - Weekly deep dives
- Blog: jamiewatters.work
Building something similar? Working with AI models? Have feedback?
Drop a comment or DM - I'd love to hear what you're working on.
Next post: Phase 4 progress (Stripe subscriptions + beta launch)
Previously:
- Phase 2 Complete: Model Catalog + Portfolio
- Phase 1 Complete: Foundation Infrastructure
- Introducing ModelOptix
Building in public, one phase at a time. Stop overpaying for AI.
Next milestone: Phase 4 complete (Monetization) - Beta launch - Target: Jan 27