Claude Opus 4 — Benchmark Results
Anthropic Overall Score: 86/100
Performance Scores
Overall
86
Rank #1 of 45 — Top 98%
SWE-bench
84
Rank #1 of 45 — Top 98%
LiveCodeBench
88
Rank #1 of 45 — Top 98%
HumanEval
96
Rank #1 of 45 — Top 98%
BigCodeBench
76
Rank #1 of 45 — Top 98%
Strengths & Weaknesses
Strengths
- Best at complex reasoning
- Strong system design
- Excellent debugging
Weaknesses
- Expensive for bulk tasks
- Slower response times
Compare with Similar-Priced Models
| Model | Overall Score | Input $/M |
|---|---|---|
| Claude Opus 4 | 86 | $15.00 |
| Claude 3 Opus | 78 | $15.00 |
| GPT-4 Turbo | 70 | $10.00 |
| OpenAI o1 | 83 | $15.00 |
| OpenAI o3 | 85 | $10.00 |