Performance Scores

Overall

78
Rank #6 of 45 — Top 87%

SWE-bench

74
Rank #6 of 45 — Top 87%

LiveCodeBench

82
Rank #5 of 45 — Top 89%

HumanEval

92
Rank #7 of 45 — Top 84%

BigCodeBench

64
Rank #6 of 45 — Top 87%

Strengths & Weaknesses

Strengths

  • Price-performance leader
  • Strong at web development
  • Excellent code review

Weaknesses

  • Struggles with complex algorithms
  • Less consistent on system design

Compare with Similar-Priced Models

ModelOverall ScoreInput $/M
Claude Sonnet 4 78 $3.00
Claude 3.5 Sonnet 72 $3.00
Claude 3 Sonnet 65 $3.00
GPT-4o 75 $2.50
Qwen 3.6 Plus 72 $3.00
Qwen Max 68 $1.60