Performance Scores

Overall

62
Rank #26 of 45 — Top 42%

SWE-bench

56
Rank #26 of 45 — Top 42%

LiveCodeBench

64
Rank #26 of 45 — Top 42%

HumanEval

84
Rank #24 of 45 — Top 47%

BigCodeBench

46
Rank #26 of 45 — Top 42%

Strengths & Weaknesses

Strengths

  • Excellent value
  • Strong coding focus

Weaknesses

  • Less general-purpose

Compare with Similar-Priced Models

ModelOverall ScoreInput $/M
DeepSeek Chat (V3) 62 $0.270
Claude 3.5 Haiku 52 $0.800
Claude 3 Haiku 45 $0.250
GPT-4o mini 58 $0.150
GPT-3.5 Turbo 40 $0.500
OpenAI o1-mini 70 $1.10