Claude 3.5 Sonnet — Benchmark Results
Anthropic Overall Score: 72/100
Performance Scores
Overall
72
Rank #10 of 45 — Top 78%
SWE-bench
68
Rank #10 of 45 — Top 78%
LiveCodeBench
75
Rank #11 of 45 — Top 76%
HumanEval
90
Rank #9 of 45 — Top 80%
BigCodeBench
58
Rank #10 of 45 — Top 78%
Strengths & Weaknesses
Strengths
- Balanced performance
- Computer use capability
- Artifact generation
Weaknesses
- Older architecture
- Falling behind Sonnet 4
Compare with Similar-Priced Models
| Model | Overall Score | Input $/M |
|---|---|---|
| Claude 3.5 Sonnet | 72 | $3.00 |
| Claude Sonnet 4 | 78 | $3.00 |
| Claude 3 Sonnet | 65 | $3.00 |
| GPT-4o | 75 | $2.50 |
| Qwen 3.6 Plus | 72 | $3.00 |
| Qwen Max | 68 | $1.60 |