Skip to main content

AI Search Arena

AI Search Arena evaluates GEO/AEO tools against 50+ standardized metrics across dimensions like AI visibility, content optimization, and technical implementation. Every score is derived from 6 independent AI model evaluations with median-based synthesis to eliminate bias. Full methodology published, SHA-256 audit packages, and vendor review windows ensure transparency. Market segment filters let users find the best tools for their specific use case: from enterprise SEO to SMB marketing.

Status:Live

Current Metrics

$0
Monthly Recurring Revenue
0
Active Users
Live
Status

Last updated: March 8, 2026

Technology Stack

Next.js 15 (App Router)React 19TypeScriptTailwind CSSshadcn/uiPrismaNeon (serverless PostgreSQL)OpenRouter (6 AI models)VercelCloudflare R2ResendPlausibleSentryGitHub Actions.

Build Journey Posts

Case Study

I Search Arena is an independent monthly benchmark platform that evaluates 27+ AI search optimization (GEO/AEO) tools using a rigorous, transparent methodology. Built to be the definitive reference for practitioners choosing between tools in a rapidly evolving market.

Each benchmark cycle scores every tool across 51 dimensions — spanning AI visibility, content optimization, technical implementation, and more — using 6 independent AI models (OpenAI, Anthropic, Google, Cohere, Mistral, Meta) via a consensus methodology. Scores are synthesized using median aggregation with confidence tags derived from inter-model agreement, ensuring no single model's bias influences results.

The platform features a full operator console for managing benchmark cycles through a 10-state lifecycle from draft and planning through evaluation, synthesis, vendor review, and publication. Tools are enrolled per cycle, evaluated in parallel across all models, and ranked using weighted composite scores with dense ranking.

Transparency is core to the design: the complete scoring methodology is published, every cycle produces a SHA-256 sealed audit package, and vendors receive a 5-business-day review window to submit factual corrections before publication. Market segment filters allow users to compare tools within their specific context — enterprise SEO, e-commerce, agency, SMB, and more.

Built as a solo founder with an AI-assisted development workflow, shipping the full platform from database schema to production in under a week.

Problem Statement

Practitioners evaluating AI search optimization tools had no independent, data-driven benchmark to compare options. Vendor marketing claims were unverifiable, tool comparisons were subjective blog posts, and no standardized scoring methodology existed. Teams were making expensive tool decisions based on anecdote rather than evidence.

Solution Approach

Designed a consensus-based evaluation system where 6 independent AI models score each tool across 51 dimensions, then synthesize results using median aggregation to eliminate individual model bias. Built a 10-state cycle lifecycle with guards and side effects to enforce data integrity at every step — tools can't advance to evaluation without minimum enrollment, scores can't publish without sealed audit packages. Vendor review windows and full methodology publication ensure accountability. The entire platform ships as a single Next.js deployment with Prisma, Neon PostgreSQL, and OpenRouter as the AI gateway.

Lessons Learned

  • Consensus beats individual assessment — Using 6 models with median synthesis and confidence tags based on inter-model agreement produced far more reliable scores than any single model could. Standard deviation directly maps to confidence: low disagreement = high confidence.
  • State machines enforce process integrity — The 10-state cycle lifecycle with transition guards (e.g., minimum 5 tools per track before evaluation) prevented every "shortcut" scenario that would have compromised data quality.
  • Transparency is a feature, not overhead — SHA-256 audit packages, published methodology, and vendor review windows initially felt like extra work but became the platform's core differentiator and credibility driver.
  • Design for resumability — The evaluation pipeline makes ~10,000 API calls per cycle. Making every stage idempotent and resumable meant network failures and timeouts were non-events rather than cycle-killing problems.

More Projects

JamieWatters.work

Build-in-public portfolio website documenting my journey from zero to $1B as an AI-powered solopreneur—with real-time metrics, transparent progress, and weekly updates.

Archived

Build-in-public portfolio website documenting my journey from zero to $1B as an AI-powered solopreneur—with real-time metrics, transparent progress, and weekly updates.

Page Views
0
Avg Session
0s
Next.js 15 (React 19)TypeScriptTailwind CSSPrisma ORM+6

ISO Tracker

ISO Tracker — Concept.

Planning

ISO Tracker — Concept

MRR
$0
Users
0
Next.js 15React 19TypeScriptTailwind CSS+1

FreeCalcHub

Free online calculators (55+ across 7 categories).

Live

Free online calculators (55+ across 7 categories)

Monthly Visitors
0
Email Signups
0
Next.jsReactTailwind CSS