Performance Scores

Overall

58
Rank #30 of 45 — Top 33%

SWE-bench

50
Rank #30 of 45 — Top 33%

LiveCodeBench

60
Rank #30 of 45 — Top 33%

HumanEval

78
Rank #31 of 45 — Top 31%

BigCodeBench

44
Rank #29 of 45 — Top 36%

Strengths & Weaknesses

Strengths

  • Very cheap
  • Fast responses

Weaknesses

  • Struggles with multi-step reasoning

Compare with Similar-Priced Models

ModelOverall ScoreInput $/M
GPT-4o Mini 58 $0.150
Claude 3.5 Haiku 52 $0.800
Claude 3 Haiku 45 $0.250
GPT-3.5 Turbo 40 $0.500
OpenAI o1-mini 70 $1.10
OpenAI o3-mini 80 $1.10