o3 — Benchmark Results
OpenAI Overall Score: 85/100
Performance Scores
Overall
85
Rank #2 of 45 — Top 96%
SWE-bench
82
Rank #2 of 45 — Top 96%
LiveCodeBench
88
Rank #1 of 45 — Top 98%
HumanEval
96
Rank #1 of 45 — Top 98%
BigCodeBench
74
Rank #2 of 45 — Top 96%
Strengths & Weaknesses
Strengths
- Latest reasoning model
- Top-tier across all benchmarks
Weaknesses
- Very expensive
- Slow
Compare with Similar-Priced Models
| Model | Overall Score | Input $/M |
|---|---|---|
| o3 | 85 | $10.00 |
| GPT-4 Turbo | 70 | $10.00 |