The AI Authority Leaderboard
Aggregated rankings from the world's most trusted AI evaluation labs.
AI Mastery Index: Our weighted metric combining intelligence, coding capability, and cost-efficiency.
| Rank | Model | Provider | Score (pts) | Performance |
|---|---|---|---|---|
1 | GPT-5.5 Pro | OpenAI | 94.2pts | |
2 | Claude 4.7 Opus | Anthropic | 93.8pts | |
3 | Gemini 3.1 Ultra | 92.5pts | ||
4 | DeepSeek-V4 | DeepSeek | 91.8pts | |
5 | Llama 4 (405B) | Meta | 90.5pts | |
6 | Mistral Large 3 | Mistral | 89.2pts | |
7 | o1-preview | OpenAI | 88.5pts | |
8 | Claude 3.5 Sonnet | Anthropic | 88pts | |
9 | GPT-4o | OpenAI | 87.2pts | |
10 | Gemini 1.5 Pro | 86.4pts | ||
11 | Llama 3.1 405B | Meta | 85pts | |
12 | Grok-2 | xAI | 84.5pts | |
13 | Qwen 2 72B | Alibaba | 83.2pts | |
14 | Llama 3.1 70B | Meta | 82.5pts | |
15 | Command R+ | Cohere | 81.8pts | |
16 | GPT-4o-mini | OpenAI | 81pts | |
17 | Gemini 1.5 Flash | 80.2pts | ||
18 | Claude 3 Haiku | Anthropic | 78.5pts | |
19 | Mistral NeMo 12B | Mistral | 76.5pts | |
20 | Llama 3.1 8B | Meta | 75pts |
Coding Performance
Measured via **HumanEval++** and **LiveCodeBench**. Reflects ability to handle complex system-level refactoring and library integration.
Agentic Reasoning
Our proprietary **AMSE-2026** benchmark. Measures success rates in 10-turn planning loops with self-correction and tool use.
Intelligence ROI
Calculated as (Average Score / Log10(Cost per 1M tokens)). Higher is better value.