GPT-4o — Benchmark Results
OpenAI Overall Score: 75/100
Performance Scores
Overall
75
Rank #9 of 45 — Top 80%
SWE-bench
70
Rank #9 of 45 — Top 80%
LiveCodeBench
78
Rank #9 of 45 — Top 80%
HumanEval
90
Rank #9 of 45 — Top 80%
BigCodeBench
62
Rank #9 of 45 — Top 80%
Strengths & Weaknesses
Strengths
- Strong general-purpose
- Good multimodal
Weaknesses
- Less consistent on coding than Claude
Compare with Similar-Priced Models
| Model | Overall Score | Input $/M |
|---|---|---|
| GPT-4o | 75 | $2.50 |
| Claude Sonnet 4 | 78 | $3.00 |
| Claude 3.5 Sonnet | 72 | $3.00 |
| Claude 3 Sonnet | 65 | $3.00 |
| Qwen 3.6 Plus | 72 | $3.00 |
| Qwen Max | 68 | $1.60 |