Claude Opus 4.8: Benchmarks, Alignment and What Actually Changed
Anthropic released Claude Opus 4.8 today at the same price as 4.7. It leads on SWE-Bench Pro (69.2%), Humanity Last Exam reasoning, computer use, and legal benchmarks. Misaligned behavior dropped to Mythos Preview levels. Here is every number.
8 min read









