Anthropic Releases Claude Opus 4.6 With Record-Breaking Benchmarks

Anthropic Releases Claude Opus 4.6 With Record-Breaking Benchmarks lol

Anthropic has released Claude Opus 4.6, positioning it as the strongest Opus model so far. The update focuses on real work performance, especially coding, long-running agent tasks, and large context handling. Anthropic also brings Opus-class models into new territory with a 1 million token context window, now available in beta.

Claude Opus 4.6 builds on Opus 4.5 but moves faster through routine steps and slows down where problems get hard. It plans more carefully, reviews its own work better, and stays reliable inside large codebases. These changes matter for developers and teams that rely on AI over long sessions, not just short prompts.

The model also targets everyday knowledge work. It handles financial analysis, research, and document creation with fewer corrections. Inside Cowork and Claude Code, Opus 4.6 can run tasks autonomously and coordinate work across tools, which reduces manual back and forth.

Key benchmark results

Claude Opus 4.6 leads several major evaluations that test practical, high-value tasks.

  • Terminal-Bench 2.0: Highest score among frontier models for agentic coding.
  • Humanity’s Last Exam: Top performance on complex, multidisciplinary reasoning.
  • GDPval-AA: Outperforms the next best model, including OpenAI’s GPT-5.2, by about 144 Elo points.
  • Beats its own predecessor, Claude Opus 4.5, by roughly 190 Elo points.
  • BrowseComp: Best score for finding hard-to-locate information online.

These results point to strength in economically valuable work, such as finance, legal research, and deep technical analysis.

Long-context performance

A major upgrade is context handling. Opus 4.6 reduces what developers often call context rot.

  • Supports 1M token context in beta.
  • Scores 76 percent on MRCR v2 8-needle tests, compared to 18.5 percent for Sonnet 4.5.
  • Tracks details across hundreds of thousands of tokens with less drift.
  • Recovers buried facts that earlier Opus models missed.

This makes the model more reliable for audits, codebase reviews, and large document analysis.

Product and API updates

Anthropic pairs the model release with platform upgrades.

  • Adaptive thinking lets the model decide when deeper reasoning is needed.
  • Four effort levels give control over speed, cost, and depth.
  • Context compaction summarizes older context to extend long-running tasks.
  • Output supports up to 128k tokens.
  • Pricing stays at $5 input and $25 output per million tokens, with premium rates for very large prompts.

Safety and availability

Anthropic reports that Opus 4.6 matches or exceeds prior models on safety, with low rates of misaligned behavior and fewer unnecessary refusals. New cybersecurity probes and safeguards address the model’s stronger defensive capabilities.

Claude Opus 4.6 is available now on claude.ai, through the Claude API, and on major cloud platforms. For teams that need depth, scale, and consistency, this release marks a clear step forward.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.