Anthropic Releases Claude Opus 4.8 with Stronger Agentic Reasoning and Honest Code Review
In this article
Anthropic has released Claude Opus 4.8, an incremental but meaningfully improved successor to Opus 4.7. The new model delivers stronger performance across coding, agentic, reasoning, and knowledge-work benchmarks while also carrying alignment improvements that make it more honest about the quality of its own outputs. It launches at the same price as its predecessor and is available immediately via the Claude API as claude-opus-4-8.
Benchmark Highlights
Early testing across a range of independent benchmarks points to a model that is more reliable in end-to-end autonomous tasks:
- Super-Agent benchmark (covering translation, deep research, slide-building, and analysis): Opus 4.8 is reportedly the only model to complete every case from start to finish, beating prior Opus versions and GPT-5.5 at equivalent cost.
- CursorBench: Opus 4.8 outperforms all prior Opus models at every effort level, with noticeably more efficient tool calling — fewer steps for the same quality of output.
- Legal Agent Benchmark: Opus 4.8 achieves the highest score on record and becomes the first model to exceed 10% on the all-pass standard, a threshold considered meaningful for autonomous handling of substantive legal work.
- Online-Mind2Web (browser and computer-use): Opus 4.8 scores 84%, a clear improvement over both Opus 4.7 and GPT-5.5.
- Finance and document workflows: Testers report better citation precision, denser outputs, and a stronger tendency to proactively flag problems in the data — rather than leaving errors for the user to discover.
Honesty as a Core Improvement
One of the most practically significant changes in Opus 4.8 is what Anthropic describes as improved honesty about the model's own work. AI models have a common failure mode where they report progress or correctness even when the evidence is thin. Opus 4.8 addresses this directly: Anthropic's evaluations show it is approximately four times less likely than Opus 4.7 to allow flaws in code it has written to go unremarked.
Alignment assessments conducted before release found that Opus 4.8 reaches new highs on prosocial traits including supporting user autonomy and acting in the user's best interest. Rates of misaligned behavior — such as deception or cooperation with misuse — are substantially lower than in Opus 4.7, and comparable to Claude Mythos Preview, Anthropic's most carefully aligned model to date.
New Features Launching Alongside Opus 4.8
Dynamic workflows in Claude Code (research preview): This is the most significant feature launch alongside the model. Claude Code can now plan a large task, spin up hundreds of parallel subagents in a single session, and then verify its outputs before returning results. The practical application: codebase-scale migrations across hundreds of thousands of lines — from kickoff through to merge, with the existing test suite as the quality bar. Dynamic workflows are available on Claude Code's Enterprise, Team, and Max plans.
Effort control: Users on claude.ai now have a direct control next to the model selector that determines how hard Claude works on a response. Higher effort settings trigger more frequent and deeper thinking at the cost of greater token consumption and slower responses; lower settings trade quality for speed and slower rate-limit burn. The control is available across all plans.
Mid-task system prompt updates via the Messages API: Developers can now inject system-level entries inside the messages array. This means an agent's permissions, token budgets, or environment context can be updated mid-task without breaking the prompt cache or needing to route the update through a user turn — a meaningful quality-of-life improvement for complex agentic harnesses.
Pricing and Effort Levels
Standard pricing is unchanged from Opus 4.7: $5 per million input tokens and $25 per million output tokens. Fast mode — where the model operates at 2.5× normal speed — has been cut to $10 per million input tokens and $50 per million output tokens, which is three times cheaper than fast mode was on previous Opus models.
Opus 4.8 defaults to high effort, which Anthropic judges to deliver the best overall balance of quality and user experience on typical tasks. For demanding or long-running asynchronous work, users can select "extra" (xhigh in Claude Code) or "max" effort levels, which spend more tokens to achieve higher accuracy. Anthropic has raised rate limits in Claude Code to accommodate higher token usage at elevated effort settings.
What Comes After Opus
Anthropic signals two directions for the near term. First, the company is working on lower-cost models that can deliver capabilities comparable to Opus — addressing the cost barrier for high-volume workloads. Second, it plans to release a new class of model with intelligence above the current Opus ceiling.
A small number of organizations are already using Claude Mythos Preview for cybersecurity work under Project Glasswing. Mythos-class models require stronger cyber safeguards before broader release; Anthropic says it is making rapid progress on those safeguards and expects to make Mythos-class capability available to all customers within weeks.