Claude Fable 5 Benchmarks Explained: Real Scores, the Opus 4.8 Fallback, and What the Numbers Actually Mean
June 10, 2026
Anthropic released Claude Fable 5 on June 9, 2026, calling it "state-of-the-art on nearly all tested benchmarks." That claim is mostly true — but the benchmark table comes with footnotes that most coverage skips over, and they change how you should read the numbers.
This post breaks down the headline scores, explains the one big caveat (some published numbers reflect Mythos 5, not the Fable 5 you can actually use), and covers the questions people are already asking on Reddit: why does Fable 5 keep switching to Opus 4.8, and is it worth the price?
The headline benchmark numbers
Here are the scores from Anthropic's launch materials and early third-party testing:
| Benchmark | Claude Fable 5 | Comparison |
|---|---|---|
| SWE-Bench Pro (coding) | 80.3% | GPT-5.5: 58.6%; ~11 points ahead of Opus 4.8 |
| FrontierCode Diamond (Cognition) | Highest of any frontier model | More than 2x Opus 4.8, even at medium effort |
| Spatial reasoning | 38.6% | Opus 4.8: 14.5% — nearly 3x |
| Harvey Legal Agent Benchmark | 13.3% | GPT-5.5: 2.1%; Gemini: 0.0% |
| GDPpdf (visual document reasoning, no tools) | 29.8% | Opus 4.8: 22.5% |
| Artificial Analysis Intelligence Index | 65 | Highest tested at max effort |
The pattern Anthropic emphasizes: the longer and harder the task, the bigger Fable 5's lead. On quick one-shot questions, the gap versus Opus 4.8 is small. On multi-hour agentic work, it's dramatic.
The caveat nobody mentions: starred rows
Anthropic's published benchmark table shows the higher score of Mythos 5 or Fable 5 for each test — and the two models are the same underlying weights, so they're usually within a few points. But on starred rows covering cybersecurity, biology, and some reasoning tasks, the published number reflects Mythos 5, the unrestricted version available only to vetted partners.
Fable 5 ships with blocking safeguards in exactly those areas. So if you're evaluating Fable 5 for security or bio work, its real-world performance on those tasks lands closer to Opus 4.8, not the headline figure. For coding, knowledge work, and vision — the areas most people care about — the published scores are the scores you get.
Why does Fable 5 keep falling back to Opus 4.8?
This is the most common complaint in the first 24 hours, so here's what's happening.
Fable 5 runs safety classifiers covering three areas: cybersecurity, biology and chemistry, and distillation attempts. When a request trips one, Claude Opus 4.8 answers instead, and you're told it happened. Anthropic says this occurs in fewer than 5% of sessions, and admits the classifiers are deliberately tuned to be cautious at launch — benign requests sometimes trigger them. Andrej Karpathy described the launch-day safeguards as "a little too trigger happy."
Three practical details:
- You are not charged Fable 5 prices for requests that fall back — they bill at Opus 4.8 rates.
- On the API, fallback isn't fully automatic the way it is in the Claude apps; developers need to configure the Fallback API.
- Anthropic says false positives will decrease as it tunes the classifiers post-launch.
Fable 5 vs Mythos 5: what's the actual difference?
Same model, different gates. Claude Mythos 5 is identical underlying weights with cyber (and for some partners, bio) safeguards lifted, restricted to Project Glasswing cybersecurity partners and soon select biology researchers. Fable 5 is the general-release version with those safeguards active. The name is the tell: fabula is Latin for "that which is told" — a cousin of the Greek mythos. The safeguards are the only thing separating them.
Real examples of what Fable 5 can do
Benchmarks are abstract; these are the launch examples worth knowing about.
The Stripe migration. In early testing on a 50-million-line Ruby codebase, Fable 5 completed a codebase-wide migration in a single day. Stripe estimated the same work at over two months for a full engineering team.
Pokémon FireRed with vision only. Earlier Claude models needed elaborate helper harnesses (maps, navigation aids, extra game state) to play Pokémon. Fable 5 beat FireRed start to finish using nothing but raw screenshots.
Factorio, autonomously. Fable 5 plays the factory-building game on its own — strategizing, planning, and building an automated factory without human direction.
Solar eclipse prediction from first principles. It built a solar system simulation deriving planetary orbital motion from physics first principles, then used it to predict eclipses.
Memory that compounds. Playing the deck-builder Slay the Spire with persistent file-based memory, Fable 5 improved three times more than Opus 4.8 did from the same setup, and reached the final act three times more often.
Pricing and the quiet deadline
Fable 5 costs 10 dollars per million input tokens and 50 dollars per million output tokens — roughly double Opus 4.8 — with a 1-million-token context window and a 90% prompt-caching discount on input. The model string is claude-fable-5.
The detail worth flagging: Fable 5 is included free on Pro, Max, Team, and seat-based Enterprise plans only through June 22, 2026. From June 23 it requires usage credits until Anthropic has capacity to restore it as a standard plan feature. If you want to test it without paying extra, the window is now.
Also budget-relevant: this is a reasoning-heavy model that thinks longer and writes more tokens per task. Early users report it consuming subscription usage limits noticeably faster than Opus 4.8 — though Anthropic and customers like Cognition report it finishing jobs in fewer turns, which offsets some of the per-token premium on hard tasks.
So is Claude Fable 5 worth it?
For long, hard, autonomous work — large migrations, multi-step agent workflows, deep research, dense document analysis — yes, that's exactly the gap it was built to widen, and the early third-party results back it up. For short, latency-sensitive, or high-volume tasks, Opus 4.8 at half the price (or Sonnet at far less) is usually the smarter default. The pattern emerging in week one: route by task difficulty, and save Fable 5 for the problems that were previously out of reach.
FAQ
What is Claude Fable 5's SWE-Bench Pro score? 80.3%, about 11 points ahead of Claude Opus 4.8 and roughly 22 points ahead of GPT-5.5's 58.6%.
Is Fable 5 the same as Mythos 5? Same underlying model. Fable 5 has safety classifiers for cybersecurity, bio/chem, and distillation; Mythos 5 has some of those lifted and is restricted to vetted partners.
Why did Fable 5 answer my question as Opus 4.8? A safety classifier flagged the request. This happens in under 5% of sessions, you're billed at Opus rates for it, and Anthropic says false-positive rates will improve.
How long is Fable 5 free on paid plans? June 9 through June 22, 2026, on Pro, Max, Team, and seat-based Enterprise. After June 23 it requires usage credits.
What is Fable 5's context window? 1,000,000 tokens, with text, image, and file inputs.
Sources: Anthropic's launch announcement, the Fable 5 / Mythos 5 system card, and early third-party testing reported by Cognition, Harvey, Hebbia, and Artificial Analysis.