Grok 3 — Benchmark Results
xAI Overall Score: 70/100
Performance Scores
Overall
70
Rank #14 of 45 — Top 69%
SWE-bench
64
Rank #14 of 45 — Top 69%
LiveCodeBench
72
Rank #15 of 45 — Top 67%
HumanEval
88
Rank #14 of 45 — Top 69%
BigCodeBench
56
Rank #12 of 45 — Top 73%
Strengths & Weaknesses
Strengths
- Strong reasoning
- X integration
Weaknesses
- Newer model
- Limited ecosystem
Compare with Similar-Priced Models
| Model | Overall Score | Input $/M |
|---|---|---|
| Grok 3 | 70 | $3.00 |
| Claude Sonnet 4 | 78 | $3.00 |
| Claude 3.5 Sonnet | 72 | $3.00 |
| Claude 3 Sonnet | 65 | $3.00 |
| GPT-4o | 75 | $2.50 |
| Qwen 3.6 Plus | 72 | $3.00 |