Microsoft Phi-4 — Benchmark Results
Microsoft Overall Score: 45/100
Performance Scores
Overall
45
Rank #40 of 45 — Top 11%
SWE-bench
38
Rank #40 of 45 — Top 11%
LiveCodeBench
46
Rank #40 of 45 — Top 11%
HumanEval
68
Rank #40 of 45 — Top 11%
BigCodeBench
30
Rank #40 of 45 — Top 11%
Strengths & Weaknesses
Strengths
- Small model, runs locally
Weaknesses
- Limited capacity
Compare with Similar-Priced Models
| Model | Overall Score | Input $/M |
|---|---|---|
| Microsoft Phi-4 | 45 | $0.100 |
| Claude 3.5 Haiku | 52 | $0.800 |
| Claude 3 Haiku | 45 | $0.250 |
| GPT-4o mini | 58 | $0.150 |
| GPT-3.5 Turbo | 40 | $0.500 |