Performance Scores

Overall

68
Rank #18 of 45 — Top 60%

SWE-bench

62
Rank #18 of 45 — Top 60%

LiveCodeBench

70
Rank #18 of 45 — Top 60%

HumanEval

86
Rank #18 of 45 — Top 60%

BigCodeBench

54
Rank #17 of 45 — Top 62%

Strengths & Weaknesses

Strengths

  • Good value
  • Latest architecture

Weaknesses

  • Mini variant limitations

Compare with Similar-Priced Models

ModelOverall ScoreInput $/M
GPT-4.1 Mini 68 $0.400
Claude 3.5 Haiku 52 $0.800
Claude 3 Haiku 45 $0.250
GPT-4o mini 58 $0.150
GPT-3.5 Turbo 40 $0.500
OpenAI o1-mini 70 $1.10