Performance Scores

Overall

65
Rank #22 of 45 — Top 51%

SWE-bench

58
Rank #22 of 45 — Top 51%

LiveCodeBench

68
Rank #22 of 45 — Top 51%

HumanEval

85
Rank #22 of 45 — Top 51%

BigCodeBench

50
Rank #23 of 45 — Top 49%

Strengths & Weaknesses

Strengths

  • Reliable
  • Good value

Weaknesses

  • Two generations behind

Compare with Similar-Priced Models

ModelOverall ScoreInput $/M
Claude 3 Sonnet 65 $3.00
Claude Sonnet 4 78 $3.00
Claude 3.5 Sonnet 72 $3.00
GPT-4o 75 $2.50
Qwen 3.6 Plus 72 $3.00
Qwen Max 68 $1.60