Performance Scores

Overall

70
Rank #14 of 45 — Top 69%

SWE-bench

64
Rank #14 of 45 — Top 69%

LiveCodeBench

72
Rank #15 of 45 — Top 67%

HumanEval

90
Rank #9 of 45 — Top 80%

BigCodeBench

54
Rank #17 of 45 — Top 62%

Strengths & Weaknesses

Strengths

  • Reasoning at lower cost
  • Good for competitive programming

Weaknesses

  • Slower than standard models

Compare with Similar-Priced Models

ModelOverall ScoreInput $/M
o1-mini 70 $1.10
Claude 3.5 Haiku 52 $0.800
Claude 3 Haiku 45 $0.250
GPT-4o mini 58 $0.150
GPT-3.5 Turbo 40 $0.500
OpenAI o3-mini 80 $1.10