share_log

DeepSeek惊艳全球,美国大模型两巨头齐发声:并不比我们先进

DeepSeek amazes the Global audience, as the two giants of the USA model voice: it is not more advanced than us.

wallstreetcn ·  Jan 30 09:53

Source: Wall Street News

Anthropic CEO Amodei believes that the reduced training costs of DeepSeek align with Industry trends and do not represent a groundbreaking technological achievement: assuming the decline in AI training costs is four times each year, if the training cost of DeepSeek-V3 is about eight times lower than the current USA model developed a year ago, it is actually fully in line with the normal trend... even if the training cost data of DeepSeek is accepted, they are merely on the Trendlines, and may not have fully reached them.

The emergence of DeepSeek R1 has introduced new variables to the Global AI Industry. In response to the impact, the two major AI giants in the USA, Anthropic and OpenAI, quickly reacted, attempting to downplay market concerns about their CSI Leading Technology Index.

On Wednesday, Anthropic CEO Dario Amodei published a long article discussing the progress of DeepSeek, noting that DeepSeek did not accomplish what US AI companies spend billions of dollars to achieve with just 6 million dollars. Taking Anthropic as an example, Claude 3.5 Sonnet is a medium-sized model with training costs amounting to tens of millions of dollars, far from the billion-dollar level.

He believes that the reduction in training costs for DeepSeek aligns with Industry trends and does not represent a groundbreaking technological achievement.

If the decline in AI training costs is at a rate of 4 times per year, and if the training cost of DeepSeek-V3 is about 8 times lower than the current models developed in the USA a year ago, then it actually completely aligns with normal trends... Even if accepting the training cost data of DeepSeek, they are just on the Trendlines, and may not have fully reached it.

The day prior, OpenAI's chief researcher Mark Chen also responded to DeepSeek R1, with an attitude that was both affirming and somewhat subtle.

Chen acknowledged that DeepSeek "independently discovered some of the core concepts that OpenAI developed during the o1 model R&D process." However, Chen quickly shifted the focus to cost issues, believing that "the external interpretation of the cost advantage is somewhat excessive."

However, Gary Marcus, a professor at NYU and an AI expert, believes that DeepSeek may have a greater impact on OpenAI than imagined.

Anthropic CEO: DeepSeek did not break Industry trends.

Amorti first dismantled the three laws governing AI development:

  1. The Scale Rule

A core feature of AI is that scale drives performance improvement. My co-founders and I were among the first to document this feature while working at OpenAI. Given the same conditions, the larger the training scale, the more stable and outstanding AI performs on a range of cognitive tasks. For example, a model trained with 1 million dollars might solve 20% of critical coding tasks, while a model trained with 10 million dollars could reach 40%, and a 0.1 billion dollar model might improve to 60%. This gap often has a significant practical impact—an additional tenfold increase in computing power could mean a leap from undergraduate-level skills to that of a PhD. Therefore, companies are investing huge amounts of money to train models at larger scales.

  1. Declining Computing Costs

New optimization ideas continue to emerge in the field of AI, making model training more efficient. These may involve architectural improvements (such as optimizations for Transformers) or enhancements in underlying Hardware efficiency. These innovations lower training costs: if a technological innovation leads to a doubling of computing efficiency, then a training task that once required 10 million dollars can now be completed with only 5 million dollars.

Every leading AI company is continuously discovering such optimization schemes, typically achieving improvements of 1.2 times, sometimes 2 times, and occasionally even up to 10 times. Given that smarter AI is of immense value, improvements in cost efficiency are almost always used to train stronger models rather than reduce overall spending—in other words, companies only invest more resources at larger scales.

From a historical trend perspective, due to improvements in algorithms and Hardware, the compute cost of AI training has decreased by about 4 times each year. This means that through normal Industry development, the cost to train a model in 2024 should be 3 to 4 times lower than in 2023.

At the same time, the decrease in training costs has also driven down inference costs. For example, Claude 3.5 Sonnet was released 15 months later than GPT-4 but has outperformed it in nearly all benchmark tests, and the API price has also dropped by about 10 times.

  1. Changes in training paradigms.

AI training methods are continuously evolving. From 2020 to 2023, the main way for Industry expansion was to increase the pre-training scale, that is, training models on massive amounts of Internet Plus-Related text, followed by a small amount of additional training. However, in 2024, Reinforcement Learning (RL) training has become the new key breakthrough. This method significantly improved AI's performance in reasoning tasks such as mathematics and programming competitions. For example, the o1-preview model released by OpenAI in September utilized this technology.

We are still in the early stages of expanding RL training. At this stage, even just an additional investment of 1 million dollars in RL training can yield massive benefits. Companies are accelerating the expansion of RL training, but currently, AI is at a unique turning point—this means that as long as the starting point is strong enough, several companies can launch similarly performing models in the short term.

Amoudi pointed out that these three points help to understand DeepSeek's recent release. About a month ago, DeepSeek launched DeepSeek-V3, a model that underwent only pre-training. Then, last week they released R1, which included the second stage of Reinforcement Learning training.

Amoudi stated that DeepSeek-V3 is actually an innovative model worth attention. As a pre-training model, it approaches the performance of the USA's leading models on certain tasks while significantly lowering training costs, even though in real-world tasks such as coding ability, Claude 3.5 Sonnet remains far ahead. The DeepSeek team has made some truly outstanding engineering optimizations in key-value cache management and expert mixture architecture.

However, Amoudi believes there are a few points that need clarification:

DeepSeek did not "achieve what it takes billions of dollars for American AI companies with 6 million dollars." Taking Anthropic as an example, Claude 3.5 Sonnet is a medium-scale model with training costs amounting to tens of millions of dollars, far from the billions level. Moreover, Claude 3.5 Sonnet was trained 9-12 months ago, while DeepSeek's model was trained from November to December 2023, and even so, Claude 3.5 Sonnet still significantly leads in several key evaluations.

DeepSeek's training costs have not broken through industry Trendlines. If the historical trend of cost curves is approximately 4 times per year, then according to normal Business development—the cost decline trend of 2023 and 2024—we should currently see a model that is 3 to 4 times cheaper than 3.5 Sonnet/GPT-4. However, the performance of DeepSeek-V3 is slightly inferior to these leading American models—assuming a difference of about 2 times on the scaling curve, which is already a rather generous estimate for DeepSeek-V3—this means that if DeepSeek-V3's training costs are about 8 times lower than the current American models developed a year ago, that would completely align with normal trends. Although exact numbers cannot be provided, it can be seen from the previous analysis that even accepting DeepSeek's training cost data, they are merely on the Trendlines, and may not have fully reached it. For instance, this is smaller than the inference price difference from the original GPT-4 to Claude 3.5 Sonnet (10 times), and 3.5 Sonnet itself is a better model than GPT-4. All these indicate that DeepSeek-V3 is not a unique breakthrough and has not fundamentally changed the economics of LLMs; it is merely an expected point on the continuous cost reduction curve. The difference is that this time, the first company to showcase the expected cost reduction is from China, something that has never happened before, with significant geopolitical implications. However, American companies will soon follow this trend—they will not achieve cost reductions by copying DeepSeek but because they are also progressing along the conventional cost reduction trend.

DeepSeek is not the first company to achieve cost optimization, but it is the first company from China. This is geopolitically significant. Nevertheless, American AI companies will also soon follow suit—not by copying DeepSeek, but because they are already advancing along the same technical path.

Furthermore, Amodi pointed out that DeepSeek possesses 50,000 Hopper GPUs, valued at about 1 billion dollars, which is 2-3 times lower than the chip scale held by major American AI companies. This means that DeepSeek's overall investment is not much less than that of American AI laboratories.

Amodi stated that the attention drawn by the recently released R1 (which even caused NVIDIA's stock to drop by 17%) was not because it was more innovative than V3 from a technical standpoint. Its reinforcement learning training essentially just replicated what OpenAI did in the o1-preview model. As AI training is still in the early stages of RL scaling, several companies can currently produce models of similar levels, but this situation won’t last long as leaders will quickly widen the gap as scaling expands.

OpenAI executives: The external interpretation of cost advantages is somewhat overstated.

OpenAI's chief researcher Mark Chen congratulated DeepSeek for its achievements on Social Media:

"Congratulations to DeepSeek on successfully developing the o1-level inference model! Their research paper indicates that they independently discovered some core ideas we used in achieving o1."

However, Chen immediately shifted the focus to cost issues, believing that "the external interpretation of cost advantages is somewhat exaggerated." He proposed the concept of "dual-axis optimization" (pre-training and reasoning), implying that OpenAI is equally capable in cost control.

Chen also mentioned the maturity of "distillation technology" and the trend of "decoupling cost and capability," emphasizing OpenAI's exploration in model compression and optimization technologies. He specifically pointed out that "low-cost service models (especially under higher latency) do not necessarily imply stronger model capabilities."

Finally, Chen stated that OpenAI will continue to pursue "reducing costs" and "enhancing capabilities" in a "dual approach" and promised that "this year will see the release of even better models."

AI Expert: DeepSeek poses a threat to OpenAI.

Gary Marcus, a professor at New York University and AI expert, believes that the emergence of DeepSeek poses a substantial threat to OpenAI.

He pointed out that "DeepSeek essentially offers for free what OpenAI wants to charge for." Marcus believes this may severely impact OpenAI's business model.

Marcus also emphasized that DeepSeek is more open than OpenAI, which will attract more talent. He questioned OpenAI's valuation of 157 billion USD, believing that under an annual loss of approximately 5 billion USD, this valuation is difficult to justify.

编辑/jayden

The translation is provided by third-party software.


The above content is for informational or educational purposes only and does not constitute any investment advice related to Futu. Although we strive to ensure the truthfulness, accuracy, and originality of all such content, we cannot guarantee it.
21
Comment Comment 18 · Views 118.7k

Comment(18)

Recommended

Write a comment
18

Statement

This page is machine-translated. Futubull tries to improve but does not guarantee the accuracy and reliability of the translation, and will not be liable for any loss or damage caused by any inaccuracy or omission of the translation.