Performance Scores

Overall

68
Rank #18 of 45 — Top 60%

SWE-bench

60
Rank #21 of 45 — Top 53%

LiveCodeBench

70
Rank #18 of 45 — Top 60%

HumanEval

86
Rank #18 of 45 — Top 60%

BigCodeBench

54
Rank #17 of 45 — Top 62%

Strengths & Weaknesses

Strengths

  • Original breakthrough model

Weaknesses

  • Two generations behind
  • Expensive

Compare with Similar-Priced Models

ModelOverall ScoreInput $/M
GPT-4 68 $30.00