share_log

Anthropic makes another breakthrough! Sonnet 4.6 operates computers with near-human capability, delivering performance comparable to the flagship Opus model at only one-fifth of the price.

wallstreetcn ·  Feb 18 03:21

Anthropic has launched a new version of its AI model, Claude Sonnet 4.6, aimed at enhancing computing capabilities in more complex ways and further advancing the efficiency of AI tools in streamlining workflows.

Less than two weeks after the release of its new flagship model, Claude Opus 4.6, OpenAI's key competitor Anthropic has launched another significant product, Claude Sonnet 4.6, offering near-flagship-level intelligence at a mid-range price point. This represents a major shift in pricing dynamics within the AI industry.

On Wednesday, February 17, Eastern Time, Anthropic officially announced the launch of Claude Sonnet 4.6. The new model features comprehensive upgrades in programming, computer operation, long-text reasoning, agent planning, knowledge work, and design. It is priced the same as its predecessor, Sonnet 4.5, at $3 per million tokens for input and $15 per million tokens for output. However, its performance reportedly approaches that of the flagship Opus model, which is priced at $15 per million tokens for input and $75 per million tokens for output, but costs only one-fifth of the latter.

For enterprise deployments conducting millions of API calls daily to deploy AI agents, this change in cost-performance ratio is transformative. In terms of computer operation capabilities, Sonnet 4.6 scored 72.5% on the standard benchmark OSWorld, achieving near-human levels in less than 18 months. During early testing, developers preferred Sonnet 4.6 over its predecessor, Sonnet 4.5, in approximately 70% of cases, and even favored it over the flagship Opus 4.5 model released in November last year in 59% of cases.

This launch comes as Anthropic accelerates its push into the enterprise market. According to reports, Anthropic completed a new funding round last Friday at a valuation of $380 billion, doubling its valuation from September of the previous year. Also on Wednesday, Infosys, an Indian IT giant, announced a partnership with Anthropic to integrate the Claude models into its Topaz AI platform for use in banking, telecommunications, and manufacturing. Anthropic has also opened its first office in India, located in Bangalore.

Computer operation capability improved fivefold in 16 months, approaching human-level performance.

The new model once again demonstrates that Anthropic has made particularly remarkable progress in operating computers.

When this feature was first introduced in October 2024, Anthropic acknowledged that it was 'still experimental—sometimes clumsy and error-prone.' Subsequent data shows that the company’s models have made astonishing progress: Sonnet 3.5 scored 14.9% on the OSWorld benchmark in October 2024; Sonnet 3.7 reached 28.0% in February 2025; Sonnet 4 hit 42.2% in June of the same year; Sonnet 4.5 climbed to 61.4% in October; and Sonnet 4.6, released this Wednesday, has now reached 72.5%.

The OSWorld benchmark presents hundreds of tasks across real-world software such as Chrome, LibreOffice, and VS Code on a simulated computer, without special APIs or dedicated connectors. The model interacts with the computer like a human, using virtual mouse clicks and a virtual keyboard. Anthropic stated that the model can handle tasks such as navigating complex spreadsheets or filling out multi-step web forms, then integrating information across multiple browser tabs.

This capability is crucial for enterprise applications. Nearly every organization has legacy software that is difficult to automate—insurance portals, government databases, enterprise resource planning systems, hospital scheduling tools—all built before the advent of APIs. A model that can view and interact with a screen like a human can automate these systems without the need to build custom connectors.

Jamie Cuffe, CEO of Pace, stated that Sonnet 4.6 achieved 94% accuracy in the company’s complex insurance computing benchmarks, making it the best-performing among all Claude models. 'It performs fault reasoning and self-correction in ways we’ve never seen,' said Cuffe. Will Harvey, co-founder of Convey, described it as 'a clear improvement over all models we have tested in our evaluations.'

Anthropic also pointed out that there is a risk of prompt injection attacks in computer operations — where malicious actors hide instructions on websites to hijack models. The company’s assessment shows that Sonnet 4.6 has made significant improvements over Sonnet 4.5 in resisting such attacks.

Programming capabilities have significantly improved, with developers showing stronger preference over the previous flagship model.

In Claude Code, Anthropic's early tests found that developers preferred Sonnet 4.6 over Sonnet 4.5 in approximately 70% of cases. Users reported that the new model more effectively reads context before modifying code and merges shared logic rather than duplicating it, making it less frustrating to use over extended periods compared to earlier models.

Users even preferred Sonnet 4.6 over Opus 4.5, last November’s flagship model, in 59% of cases. They noted that Sonnet 4.6 significantly reduced tendencies toward over-engineering and "laziness," excelling particularly in following instructions. Reports indicated fewer false success claims, fewer hallucinations, and more consistent execution of multi-step tasks.

Early customer feedback highlighted improvements in frontend coding and financial analysis. Multiple testers independently described Sonnet 4.6’s visual outputs as noticeably more refined, with superior layout, animations, and design sense compared to previous models. Customers also required fewer rounds of iteration to achieve production-quality results.

Michael Truell, co-founder and CEO of Cursor, stated: "Claude Sonnet 4.6 surpasses Sonnet 4.5 in all aspects, including long-term tasks and more challenging problems." Joe Binder, Vice President of Product at GitHub, confirmed that the model "has already performed exceptionally well in complex code fixes, especially when searching across large codebases is critical. For teams running agent programming at scale, we are seeing strong resolution rates and the consistency that developers need."

David Loker, Vice President of CodeRabbit AI, described the model as "punching far above its weight in the vast majority of practical PR scenarios." Leo Tchourakov of Factory AI said the team "is shifting Sonnet traffic to this model." Brendan Falk, founder and CEO of Hercules, was even more direct: "Claude Sonnet 4.6 is the best model we've seen so far. It offers Opus 4.6-level accuracy, instruction-following, and user interface, but at a significantly lower cost."

Flagship performance at mid-range pricing, with costs for large-scale deployment sharply reduced.

The pricing strategy for Sonnet 4.6 is one of the most important highlights of this release. Pricing remains at $3 per million tokens for input and $15 per million tokens for output, the same as its predecessor, Sonnet 4.5. In contrast, Anthropic’s flagship Opus model is priced at $15 per million tokens for input and $75 per million tokens for output — five times the price of Sonnet.

According to Anthropic, the level of performance previously achievable only with Opus-class models — including on office tasks with actual economic value — can now be attained with Sonnet 4.6. For the thousands of enterprises currently deploying AI agents making millions of API calls daily, this cost calculation changes everything.

According to Venture Beat, in many categories that matter most to businesses, Sonnet 4.6 performs on par with or even surpasses models with operating costs five times higher. A company running an AI agent that processes 10 million tokens daily was previously forced to choose between lower-cost subpar results and rapidly expanding expenditures for premium outcomes. Sonnet 4.6 largely eliminates this trade-off.

Several early testers explicitly described how Sonnet 4.6 removes the necessity of using the more expensive Opus tier. Caitlin Colgrove, Chief Technology Officer of Hex Technologies, stated that her company is shifting most of its traffic to Sonnet 4.6, noting, "We see Opus-level performance across all tasks except the most challenging analytical ones, with a more efficient and flexible configuration. Under Sonnet pricing, it's an obvious choice for our workloads."

Ben Kus, Chief Technology Officer of Box, stated that the model outperforms Sonnet 4.5 by 15 percentage points in intensive reasoning-based question-answering on real-world enterprise documents. Michele Catasta, President of Replit, described the performance-to-cost ratio as "extraordinary." Ryan Wiggins of Mercury Banking put it more bluntly: "Claude Sonnet 4.6 is faster, cheaper, and more likely to get it right the first time. This combination of improvements is astonishing—we didn't expect to see it at this price point."

A context window of one million tokens enables long-term strategic planning.

Sonnet 4.6 comes equipped with a 1-million-token context window (beta), sufficient to accommodate entire codebases, lengthy contracts, or dozens of research papers within a single request. More importantly, Anthropic claims the model can reason effectively across all contexts.

The company demonstrated this capability through an unusual evaluation. The Vending-Bench Arena tests models' ability to simulate business operations over the long term, where different AI models compete to maximize profits. Without human prompting, Sonnet 4.6 devised a novel strategy: it heavily invested in capacity during the first ten simulated months, spending significantly more than competitors, then sharply pivoted to focus on profitability in the final stages. By the end of the 365-day simulation, the model had a balance of approximately $5,700, compared to about $2,100 for Sonnet 4.5.

This autonomous execution of multi-month strategic planning represents a qualitatively different capability, going beyond answering questions or generating code snippets. It is the type of long-term reasoning that makes AI agents suitable for practical business operations.

Rapid release cadence amid fierce competition

The release of Sonnet 4.6 comes at a time of intense competition in the AI industry. This marks Anthropic's second major AI model launch in less than two weeks, reflecting the fast-paced development required to remain competitive in the sector. Anthropic had just introduced Claude Opus 4.6 twelve days prior.

Anthropic's rapid progress has also accelerated the recent massive sell-off in software stocks. Investors are increasingly concerned that AI may disrupt these businesses, with the iShares Expanded Tech Software Sector ETF plummeting over 20% year-to-date. Sonnet 4.6 is unlikely to ease these concerns, as Anthropic stated the model would bring "significantly improved coding skills" to a broader user base.

According to Bloomberg, Anthropic's recent developments have sparked concerns on Wall Street. The company quietly released a tool automating certain legal tasks earlier this month, triggering a market crash as investors feared software firms might eventually become obsolete. Following Anthropic's release of an updated Opus model designed for enhanced financial research, financial services stocks also plummeted significantly. These reactions reflect widespread concerns about which companies and services may ultimately be disrupted by AI.

According to TechCrunch, Anthropic CEO Dario Amodei stated, "There is a significant gap between AI models that work well in demonstrations and those that are effective in regulated industries," with Infosys helping to bridge this gap. TechCrunch also reported that India currently accounts for approximately 6% of global Claude usage, second only to the United States.

In the competitive landscape, Sonnet 4.6 outperformed Google's Gemini 3 Pro and OpenAI's GPT-5.2 across multiple benchmarks. GPT-5.2 lagged behind in agent-based computer usage (38.2% vs. 72.5%), agent search (77.9% vs. 74.7%), and agent financial analysis (59.0% vs. 63.3%). While Gemini 3 Pro demonstrated competitiveness in visual reasoning and multilingual benchmarks, it fell short in the rapidly growing category of enterprise investment for agents.

According to CNBC, OpenAI is also in negotiations with investors for a funding round that could value the company at close to 100 billion US dollars.

Claude Sonnet 4.6 is now available across all Claude plans, Claude Cowork, Claude Code, API, and all major cloud platforms. Anthropic has also upgraded its free tier to default to Sonnet 4.6. Developers can access claude-sonnet-4-6 immediately through the Claude API.

Looking to pick stocks or analyze them? Want to know the opportunities and risks in your portfolio? For all investment-related questions,just ask Futubull AI!

Editor/Stephen

The translation is provided by third-party software.


The above content is for informational or educational purposes only and does not constitute any investment advice related to Futu. Although we strive to ensure the truthfulness, accuracy, and originality of all such content, we cannot guarantee it.
    Write a comment