Performance Scores

Overall

78
Rank #6 of 45 — Top 87%

SWE-bench

74
Rank #6 of 45 — Top 87%

LiveCodeBench

80
Rank #7 of 45 — Top 84%

HumanEval

94
Rank #4 of 45 — Top 91%

BigCodeBench

64
Rank #6 of 45 — Top 87%

Strengths & Weaknesses

Strengths

  • Strong reasoning
  • Proven track record

Weaknesses

  • Older generation
  • Expensive

Compare with Similar-Priced Models

ModelOverall ScoreInput $/M
Claude 3 Opus 78 $15.00
Claude Opus 4 86 $15.00
GPT-4 Turbo 70 $10.00
OpenAI o1 83 $15.00
OpenAI o3 85 $10.00