Anthropic launches Claude Sonnet 4.5 to bolster coding for enterprises

Claude Sonnet 4.5 enhances generative AI coding, reasoning, and long-task work; Anthropic adds API tools, Agent SDK, code execution, checkpoints, and safety for enterprises and startups.

author-image
Manisha Sharma
New Update
Anthropic

Anthropic on Monday announced Claude Sonnet 4.5, an update the company describes as focused on coding and agent-based workflows. The company said the model demonstrates improvements in reasoning, math and management of long-duration tasks. On benchmark measures cited by Anthropic, Sonnet 4.5 scored highest on the SWE-bench Verified evaluation (an evaluation of real-world software coding skills) and scored 61.4% on the OSWorld benchmark for operating-system dexterity, up from 42.2% for an earlier Sonnet 4 release.

screenshot

Anthropic said the model is available via the Claude API at the same pricing as Sonnet 4: $3 per million tokens for standard use and $15 per million for extended use. The company also said Claude apps now allow code execution and file creation within conversations and that the Claude API has added context-editing and memory tools to support longer tasks.

“Claude Sonnet 4.5 is the best coding model in the world. It’s the strongest model for building complex agents,” the company said in its blog post. “It’s also the best model at using computers and shows substantial gains in reasoning and math.”

Claude Sonnet 4.5: developer tools and integrations

Anthropic positioned Sonnet 4.5 as part of a broader developer stack. The release expands integrations and tooling: Claude Sonnet 4.5 is integrated into Claude Code, which now includes checkpoints to save progress and roll back to earlier states, a refreshed terminal interface, and a native VS Code extension. Anthropic also said it released a Claude for Chrome extension for Max users on the waitlist.

Developers can access the Claude Agent SDK, which Anthropic describes as the same infrastructure used internally to build Claude Code. In Anthropic’s words:

“The Agent SDK gives you the same foundation to build something just as capable for whatever problem you’re solving,” the spokesperson said.

Anthropic also introduced a temporary research preview called “Imagine with Claude”, which the company made available to Max subscribers for five days at claude.ai/imagine (as noted in the brief you shared).

Advertisment

Claude Sonnet 4.5: enterprise positioning and safety

Anthropic framed the release around enterprise and power-user needs rather than consumer virality. The company said Sonnet 4.5 is stronger in finance and scientific reasoning and better at “using computers”, and it highlighted sustained runs on tasks: internal tests reportedly produced a web app from scratch, and Anthropic said one customer ran an AI chatbot coding autonomously for 30 hours, compared with a seven-hour run for an earlier model.

Industry partnerships and platform integrations were noted in the material you supplied: Microsoft said it would add new Microsoft 365 Copilot features powered by Anthropic models, including “Agent Mode” in Excel and Word and an “Office Agent” in Copilot chat, with PowerPoint to follow.

Anthropic described safety and alignment improvements alongside capability gains. The company said Sonnet 4.5 shows reductions in misaligned behaviours — citing decreases in sycophancy, deception, and power-seeking — and that the model is released under Anthropic’s AI Safety Level 3 framework, which includes classifiers intended to flag potentially dangerous content. 1111

Claude Sonnet 4.5: early reports and use cases

Anthropic reported early users seeing improved performance across finance, law, medicine, and STEM domains. The company highlighted real-world coding and multi-step workflows as a focus for this release, and it is marketing the model and its tooling to regulated industries and teams that require a model to operate across multiple software tools.

A few questions the market will watch as Sonnet 4.5 is adopted:

• How repeatable are the long-run coding claims outside Anthropic’s internal tests and initial customers?
• How will enterprises validate OS dexterity and end-to-end agent behaviour before production deployment?
• How effective are the AI Safety Level 3 classifiers in real deployments, and how will Anthropic measure reductions in misaligned behaviour?
• Will the pricing tiers and tooling (API, Agent SDK, checkpoints, VS Code extension) make adoption straightforward for in-house engineering teams and startups alike?
• How quickly will partner integrations — such as the Microsoft 365 Copilot features you described — translate into enterprise product changes and user workflows?

Anthropic’s brief frames Claude Sonnet 4.5 as a step toward more capable, sustained agent and coding workflows, bundled with developer tooling and safety controls. The material you shared places emphasis on benchmarks, integrations, and safety frameworks; verifying those claims in diverse, independent enterprise settings will shape the model’s practical impact.

Advertisment