ASAPAi Soon As Possible · AI & tech, delivered fastest
Article

Anthropic's Fable 5 Takes No. 1 on the DeepSWE Coding Benchmark

AASAP
2026-06-13 · 2 min read

Anthropic's Claude Fable 5 has claimed the top spot in the coding-agent evaluation run by AI benchmarking firm Artificial Analysis. Artificial Analysis replaced the previous SWE-Bench Pro with Datacurve's "DeepSWE," and Fable 5 paired with Claude Code took the lead with a score of 77. OpenAI's GPT-5.5 (Codex) followed with 76. Released on June 9, 2026, Fable 5 has now had its coding ability confirmed by an independent benchmark.

What Was Announced

Artificial Analysis announced Fable 5 as No. 1 in its coding-agent rankings. The configuration pairing Claude Code with Fable 5 [max] took the lead on the Coding Agent Index with a score of 77, followed by OpenAI Codex + GPT-5.5 [xhigh] at 76 and Claude Code + Opus 4.8 [max] at 73. Fable 5, Anthropic's latest model released on June 9, 2026, claimed first place in coding on an independent evaluation right after its launch.

What Is the DeepSWE Benchmark?

DeepSWE is a benchmark that measures an AI's coding ability using real-world software-development tasks built entirely from scratch. Artificial Analysis said it had concluded that its previously used SWE-Bench Pro had become prone to inflated scores due to issues such as repository-history leakage, and replaced it with DeepSWE, created by Datacurve. The goal of DeepSWE is to reduce "benchmark gaming" — in which models solve tasks by memorizing training data — by posing new tasks that do not rely on publicly available code history.

How Strong Is Fable 5 at Coding?

Fable 5 has posted top-tier results across multiple coding metrics. According to published figures, Fable 5 scored 95.0% on SWE-bench Verified and 80.0% on SWE-bench Pro, and ranked first on the code-focused evaluation FrontierCode. With its new No. 1 finish on the DeepSWE-based Coding Agent Index added in, Fable 5 has been rated as one of the strongest coding models available at launch.

Why It Matters

This result speaks to both benchmark credibility and the competitive landscape among models. An evaluator retiring a benchmark that had grown vulnerable to gaming and replacing it with a new one reflects a broader push to preserve trust in AI performance figures. Moreover, with only a single point separating Fable 5 and GPT-5.5, the result reveals just how fierce the race for the lead is in coding agents. For developers, it offers a useful reference for gauging which model-and-tool combinations perform better on real work.


Sources: Artificial Analysis · AI Times · LLM-Stats

← All posts