Performance Scores

Overall

70
Rank #14 of 45 — Top 69%

SWE-bench

64
Rank #14 of 45 — Top 69%

LiveCodeBench

72
Rank #15 of 45 — Top 67%

HumanEval

88
Rank #14 of 45 — Top 69%

BigCodeBench

56
Rank #12 of 45 — Top 73%

Strengths & Weaknesses

Strengths

  • Strong reasoning
  • X integration

Weaknesses

  • Newer model
  • Limited ecosystem

Compare with Similar-Priced Models

ModelOverall ScoreInput $/M
Grok 3 70 $3.00
Claude Sonnet 4 78 $3.00
Claude 3.5 Sonnet 72 $3.00
Claude 3 Sonnet 65 $3.00
GPT-4o 75 $2.50
Qwen 3.6 Plus 72 $3.00