Performance Scores

Overall

80
Rank #4 of 45 — Top 91%

SWE-bench

76
Rank #4 of 45 — Top 91%

LiveCodeBench

85
Rank #3 of 45 — Top 93%

HumanEval

94
Rank #4 of 45 — Top 91%

BigCodeBench

65
Rank #5 of 45 — Top 89%

Strengths & Weaknesses

Strengths

  • Excellent at competitive programming
  • Strong algorithmic reasoning

Weaknesses

  • Optimized for reasoning, not chat

Compare with Similar-Priced Models

ModelOverall ScoreInput $/M
o3-mini 80 $1.10
Claude 3.5 Haiku 52 $0.800
Claude 3 Haiku 45 $0.250
GPT-4o mini 58 $0.150
GPT-3.5 Turbo 40 $0.500
OpenAI o1-mini 70 $1.10