Performance Scores

Overall

70
Rank #14 of 45 — Top 69%

SWE-bench

64
Rank #14 of 45 — Top 69%

LiveCodeBench

72
Rank #15 of 45 — Top 67%

HumanEval

88
Rank #14 of 45 — Top 69%

BigCodeBench

56
Rank #12 of 45 — Top 73%

Strengths & Weaknesses

Strengths

  • Proven model
  • Large context

Weaknesses

  • Superseded by GPT-4o

Compare with Similar-Priced Models

ModelOverall ScoreInput $/M
GPT-4 Turbo 70 $10.00
OpenAI o3 85 $10.00