Performance Scores

Overall

72
Rank #10 of 45 — Top 78%

SWE-bench

68
Rank #10 of 45 — Top 78%

LiveCodeBench

76
Rank #10 of 45 — Top 78%

HumanEval

90
Rank #9 of 45 — Top 80%

BigCodeBench

56
Rank #12 of 45 — Top 73%

Strengths & Weaknesses

Strengths

  • Strong reasoning chain
  • Good value

Weaknesses

  • Slow on simple tasks

Compare with Similar-Priced Models

ModelOverall ScoreInput $/M
DeepSeek Reasoner (R1) 72 $0.550
Claude 3.5 Haiku 52 $0.800
Claude 3 Haiku 45 $0.250
GPT-4o mini 58 $0.150
GPT-3.5 Turbo 40 $0.500
OpenAI o1-mini 70 $1.10