Claude Sonnet 4 — Benchmark Results
Anthropic Overall Score: 78/100
Performance Scores
Overall
78
Rank #6 of 45 — Top 87%
SWE-bench
74
Rank #6 of 45 — Top 87%
LiveCodeBench
82
Rank #5 of 45 — Top 89%
HumanEval
92
Rank #7 of 45 — Top 84%
BigCodeBench
64
Rank #6 of 45 — Top 87%
Strengths & Weaknesses
Strengths
- Price-performance leader
- Strong at web development
- Excellent code review
Weaknesses
- Struggles with complex algorithms
- Less consistent on system design
Compare with Similar-Priced Models
| Model | Overall Score | Input $/M |
|---|---|---|
| Claude Sonnet 4 | 78 | $3.00 |
| Claude 3.5 Sonnet | 72 | $3.00 |
| Claude 3 Sonnet | 65 | $3.00 |
| GPT-4o | 75 | $2.50 |
| Qwen 3.6 Plus | 72 | $3.00 |
| Qwen Max | 68 | $1.60 |