AI Search Arena

AI Search Arena evaluates GEO/AEO tools against 50+ standardized metrics across dimensions like AI visibility, content optimization, and technical implementation. Every score is derived from 6 independent AI model evaluations with median-based synthesis to eliminate bias. Full methodology published, SHA-256 audit packages, and vendor review windows ensure transparency. Market segment filters let users find the best tools for their specific use case: from enterprise SEO to SMB marketing.

Visit Live Site

Launched: March 2026•

Status:Live

Current Metrics

Benchmarks Run

Total benchmarks completed

Last updated: May 22, 2026

Technology Stack

Next.js 15React 19TypeScriptTailwind CSSshadcn/uiPrismaNeonOpenRouter (6 AI models)VercelCloudflare R2ResendPlausibleSentryGitHub Actions.

Build Journey Posts

We Just Ran Our 2nd Benchmark Cycle. Here's What 21K API Calls and 12M Tokens Actually Looks Like.

Most benchmark sites publish once and call it done. We just completed our second cycle. Here's what multi-cycle data actually unlocks.

Mar 8, 2026•4 min read

I Built the Consumer Reports for AI SEO Tools. In 6 Days.

Six days ago, this didnt exist. Now its live with 32 tools scored across 51 dimensions by 6 AI models. Zero sponsors.

Mar 6, 2026•2 min read

We Scored 32 AI SEO Tools With 9,792 Tests. Here's What No One Else Will Tell You.

We just published the first independent benchmark of 32 AI search optimization tools. 9,792 evaluations. 6 AI models. Zero sponsors. Here's what we found.

Mar 3, 2026•5 min read

Why Our AI Search Benchmark Methodology Lives in a Public GitHub Repo

We published our entire scoring methodology — 51 dimensions, 128 evaluation prompts, 6 AI models. Here's why every benchmark should be transparent.

Mar 3, 2026•5 min read

I Built an AI Benchmark Platform in a Day. Here's Why It Matters.

27 tools. Every one says they're the best. Not one can prove it. I built AISearchArena.com to change that.

Mar 2, 2026•2 min read

Case Study

I Search Arena is an independent monthly benchmark platform that evaluates 27+ AI search optimization (GEO/AEO) tools using a rigorous, transparent methodology. Built to be the definitive reference for practitioners choosing between tools in a rapidly evolving market.

Each benchmark cycle scores every tool across 51 dimensions — spanning AI visibility, content optimization, technical implementation, and more — using 6 independent AI models (OpenAI, Anthropic, Google, Cohere, Mistral, Meta) via a consensus methodology. Scores are synthesized using median aggregation with confidence tags derived from inter-model agreement, ensuring no single model's bias influences results.

The platform features a full operator console for managing benchmark cycles through a 10-state lifecycle from draft and planning through evaluation, synthesis, vendor review, and publication. Tools are enrolled per cycle, evaluated in parallel across all models, and ranked using weighted composite scores with dense ranking.

Transparency is core to the design: the complete scoring methodology is published, every cycle produces a SHA-256 sealed audit package, and vendors receive a 5-business-day review window to submit factual corrections before publication. Market segment filters allow users to compare tools within their specific context — enterprise SEO, e-commerce, agency, SMB, and more.

Built as a solo founder with an AI-assisted development workflow, shipping the full platform from database schema to production in under a week.

Problem Statement

Practitioners evaluating AI search optimization tools had no independent, data-driven benchmark to compare options. Vendor marketing claims were unverifiable, tool comparisons were subjective blog posts, and no standardized scoring methodology existed. Teams were making expensive tool decisions based on anecdote rather than evidence.

Solution Approach

Designed a consensus-based evaluation system where 6 independent AI models score each tool across 51 dimensions, then synthesize results using median aggregation to eliminate individual model bias. Built a 10-state cycle lifecycle with guards and side effects to enforce data integrity at every step — tools can't advance to evaluation without minimum enrollment, scores can't publish without sealed audit packages. Vendor review windows and full methodology publication ensure accountability. The entire platform ships as a single Next.js deployment with Prisma, Neon PostgreSQL, and OpenRouter as the AI gateway.

Lessons Learned

Consensus beats individual assessment — Using 6 models with median synthesis and confidence tags based on inter-model agreement produced far more reliable scores than any single model could. Standard deviation directly maps to confidence: low disagreement = high confidence.
State machines enforce process integrity — The 10-state cycle lifecycle with transition guards (e.g., minimum 5 tools per track before evaluation) prevented every "shortcut" scenario that would have compromised data quality.
Transparency is a feature, not overhead — SHA-256 audit packages, published methodology, and vendor review windows initially felt like extra work but became the platform's core differentiator and credibility driver.
Design for resumability — The evaluation pipeline makes ~10,000 API calls per cycle. Making every stage idempotent and resumable meant network failures and timeouts were non-events rather than cycle-killing problems.

More Projects

ISO Tracker

ISO Tracker — Concept.

Planning

ISO Tracker — Concept

Next.js 15React 19TypeScriptTailwind CSS+1

View Details

FreeCalcHub

Free online calculators (55+ across 7 categories).

Live

Free online calculators (55+ across 7 categories)

Next.jsReactTailwind CSS

View Details

View All Projects →