Performance Scores

Overall

45
Rank #40 of 45 — Top 11%

SWE-bench

38
Rank #40 of 45 — Top 11%

LiveCodeBench

46
Rank #40 of 45 — Top 11%

HumanEval

68
Rank #40 of 45 — Top 11%

BigCodeBench

30
Rank #40 of 45 — Top 11%

Strengths & Weaknesses

Strengths

  • Small model, runs locally

Weaknesses

  • Limited capacity

Compare with Similar-Priced Models

ModelOverall ScoreInput $/M
Microsoft Phi-4 45 $0.100
Claude 3.5 Haiku 52 $0.800
Claude 3 Haiku 45 $0.250
GPT-4o mini 58 $0.150
GPT-3.5 Turbo 40 $0.500