Performance Scores

Overall

76
Rank #8 of 45 — Top 82%

SWE-bench

72
Rank #8 of 45 — Top 82%

LiveCodeBench

79
Rank #8 of 45 — Top 82%

HumanEval

89
Rank #13 of 45 — Top 71%

BigCodeBench

64
Rank #6 of 45 — Top 87%

Strengths & Weaknesses

Strengths

  • Large context window (1M tokens)
  • Good at code understanding

Weaknesses

  • Less consistent code generation

Compare with Similar-Priced Models

ModelOverall ScoreInput $/M
Gemini 2.5 Pro 76