share_log

Anthropic's late-night move wipes out $50 billion industry! The era of code auditing is coming to an end.

wallstreetcn ·  Mar 11 00:43

Anthropic has added a code review feature to Claude Code, directly challenging the $50 billion code security auditing industry. Traditional tools cost up to $50,000 annually and have high false-positive rates, whereas Claude deploys multi-agent teams for parallel reviews at a cost of just $15-25 per instance. Internal tests show that the proportion of pull requests (PRs) containing substantive review comments surged from 16% to 54%, with over 99% approval from engineers. The era of AI writing code and conducting reviews has arrived.

Just moments ago, Anthropic made another move!

The father of Claude Code has officially announced a significant update: the addition of a new code review feature to Claude Code.

This time, it is targeting a $50 billion industry—code security audits.

The newly released feature by Anthropic can be seen as directly challenging the entire code security industry in an extremely straightforward manner.

Some exclaimed: A $50 billion industry has been overturned by Anthropic overnight!

Now, one can sit back and wait for cybersecurity stocks to plummet.

At Anthropic, nearly every pull request (PR) has been tested using this system.

After months of testing, the results are as follows:

  • The proportion of PRs containing substantial review comments increased from 16% to 54%.

  • Less than 1% of engineers found the review outcomes to be incorrect.

  • In large Pull Requests (over 1,000 lines), 84% of PRs have superficial issues, with an average of 7.5 problems per PR.

Currently, this feature has been launched as a research preview for the Claude Team and in the Enterprise beta version.

A Nightmare for the $50 Billion Market

Anthropic's product has triggered an earthquake of historic proportions across the global AI community and the application security sector (AppSec).

Senior developers have exclaimed that the $50 billion code auditing industry has been disrupted!

This is because, in the past, large companies paid traditional security vendors (such as Snyk, Checkmarx, etc.) annual licensing fees of up to $50,000 or more to prevent bugs or security vulnerabilities from reaching the production environment, hiring professional teams to scan and audit the code.

Now, however, Claude can deploy a team of AI agents to reside in your PRs, standing by 24/7.

Moreover, calculated by tokens, the average cost of a single Review is only $15 to $25!

$50,000 versus $25—a difference of 2,000 times.

This is not merely a functional update; it is the sounding of the death knell for traditional code auditing.

Code Review, the most painful part for developers

If you ask any engineering team: what is the biggest bottleneck in software development?

Many people would say it's code review (Code Review).

Over the past few years, AI’s ability to write code has been advancing rapidly. Whether using GitHub Copilot, Cursor, Claude Code, or ChatGPT, developers leveraging these tools have seen a dramatic surge in their code output.

As a result, a problem arises – while code is being produced at a rapid pace, the number of people reviewing the code hasn’t increased.

Anthropic found that over the past year, each engineer's code output has increased by 200%, but many pull requests (PR) are only being skimmed through quickly.

Even developers themselves admit that many code reviews are merely going through the motions.

Consequently, numerous bugs, vulnerabilities, and logical issues are being introduced into production environments.

This is why many companies are willing to pay exorbitant prices for security scanning tools.

However, a problem arises – these tools aren't very smart.

What are the issues with traditional code scanning tools?

If you have used traditional AppSec tools, such as Snyk, Checkmarx, Veracode, SonarQube, etc., you are likely to have this feeling: there are too many false positives.

The reason is that most of these tools are based on static rules and known vulnerability databases. They can scan the code but cannot truly understand it.

A common scenario is that the tool alerts for a potential SQL injection risk, but after developers check thoroughly, they find no issue.

As a result, people gradually start to ignore warnings, and truly dangerous problems often slip through the cracks.

Therefore, companies still need extensive manual Code Review, and what Anthropic has done this time is to automate this process.

Anthropic Unveils an AI-Powered Code Review Team

This time, the approach behind Claude Code Review is rather straightforward.

In Claude Code, the system can automatically analyze Pull Requests and conduct inspections from multiple perspectives, such as:

  • Whether the code adheres to project rules

  • Whether there are potential bugs

  • Whether the modifications conflict with the logic of historical code

  • Whether issues raised in previous PRs have reappeared

Ultimately, they will output two results: a high-level summary comment and an inline comment indicating specific code locations.

In other words, when you open a PR, you will see an AI review report that highlights truly important issues instead of dozens of pages of verbose details.

The era of 'AI writing code, AI reviewing' has finally arrived.

Claude is beginning to exhibit self-looping and self-recursion behaviors.

As AI capabilities grow stronger, the only role left for humans may be to turn the AI on; keyboards might only need a Claude button.

Multi-Agent system activated: the Claude Code Review squad is mobilized.

The biggest feature of Claude Code Review is that it is not a single AI but a team.

When a PR is created, the system automatically initiates a team of AI agents.

According to reports, Claude’s new code review function deploys multiple AI 'review agents' working in parallel, with each agent responsible for different types of checks.

These agents filter out false positives through verification and rank errors by severity. The final result is presented on the PR as a high-signal comprehensive review comment along with inline comments targeting specific errors.

The scale of the review adjusts according to the size of the PR.

Large or complex changes receive more agents and in-depth reviews, while minor changes are quickly approved. According to Anthropic’s testing, the average review time is approximately 20 minutes.

Ultimately, through mutual validation among multiple agents, false positives can be reduced.

During this process, it focuses on identifying logical errors, security vulnerabilities, edge case defects, and hidden regression issues.

All identified issues are tagged by severity level.

  • A red dot indicates general issues, i.e., bugs that should be fixed before merging the code;

  • A yellow dot indicates minor issues, which are recommended for fixing but will not block merging.

  • The purple dots indicate pre-existing issues, not bugs introduced by this PR.

Each review comment also includes a collapsible extended reasoning section.

When expanded, you can see:

  • Why Claude flagged the issue

  • How it verified that the issue indeed exists

It is important to note that these comments neither automatically approve nor block PR merging, thus not disrupting the existing code review process.

By default, Claude Code Review primarily focuses on code correctness.

This means it emphasizes checking for:

  • Bugs that could cause production environment failures

  • Actual logic problems

without focusing heavily on issues like code formatting, style preferences, or the absence of tests.

If users wish to expand the scope of checks, manual configuration is required.

The internal test results were nothing short of alarming.

The internal test results from Anthropic were absolutely alarming! They further underscore that traditional code reviews are essentially a joke.

The internal data was shocking: only 16% of pull requests (PRs) received substantive review feedback.

In large PRs exceeding 1,000 lines of code, 84% of the code had issues flagged, with an average of 7.5 bugs identified per PR.

Why? The reason is simple: engineers are overwhelmed with work.

Over the past year at Anthropic, each engineer’s code output has increased by 200%. With more and more code being produced, who has the time to meticulously review every line?

After implementing this functionality, the proportion of PRs in the codebase receiving substantive improvement suggestions surged from 16% to 54%.

This means that previously, nearly 40% of potentially problematic code slipped past human reviewers unnoticed, whereas now, all such issues are caught by Claude.

More alarmingly, for small pull requests with fewer than 50 lines of code, it was previously assumed that such minor changes could hardly cause any issues.

However, 31% of these small modifications were found to contain problems, meaning one out of every three minor changes hid a bug.

The issues identified were acknowledged by engineers with an acceptance rate exceeding 99%. Less than 1% of the flagged results were marked as false positives by engineers.

This level of accuracy surpasses that of the vast majority of human reviewers.

Anthropic provided an internal example: a one-line code change to a production service, which appeared to be a routine operation and typically would have been quickly approved. However, the code review flagged it as a critical issue.

The change would have caused authentication to fail, a failure mode that is easy to overlook in a diff but becomes glaringly obvious once pointed out.

The issue was resolved before merging, and the engineers later admitted they might not have noticed it on their own.

Let us now discuss a real-world case.

iXsystems, the company behind TrueNAS, conducted a Code Review for a refactoring related to ZFS encryption.

This was a highly technical modification, reviewed by experts in the field.

As a result, Code Review did something that surprised everyone: it discovered a potential bug in the "adjacent code."

That bug was not within the core scope of this change; the code was merely "incidentally affected by the modification." This type mismatch issue would silently erase the encryption key cache every time synchronization occurred.

It was a bug that had been hidden for a very long time, always there, just undiscovered by anyone.

Human experts could hardly have found it because it wasn't in the diff and wasn't the focus of attention. But one day, it might have blown up your system.

However, now Code Review has suddenly uncovered it.

A major industry shakeout is here.

Now, security companies and SaaS vendors are all lamenting.

How much longer can a code security company charging $50,000 annually survive?

It's not that their technology is bad; it's that the business logic has changed.

If Anthropic can use intelligent agent teams to resolve deep business logic security audits for just $20, who will still buy those traditional scanners costing tens of thousands of dollars with absurdly high false-positive rates?

If you are still manually reviewing thousands of lines of code or paying high security audit fees, it's time to wake up—times have changed.

Tonight, the stocks in the AppSec industry may truly feel the chill brought by AI.

Editor/Stephen

The translation is provided by third-party software.


The above content is for informational or educational purposes only and does not constitute any investment advice related to Futu. Although we strive to ensure the truthfulness, accuracy, and originality of all such content, we cannot guarantee it.
    Write a comment