share_log

击穿全网底价,通义千问GPT-4级大模型直降97%!1块钱能买200万tokens

Breaking through the bottom price of the entire network, Tongyi Qianwen GPT-4 big model dropped 97%! 1 dollar can buy 2 million tokens

量子位 ·  May 21 16:07

Source: Quantum Bits

Tongyi Qianwen GPT-4 big model directly breaks through the bottom price of the entire network!

Just now, Ali suddenly made a big move and officially announced price cuts for 9 big models of Tongyi.

Among them, Qwen-Long, the main model whose performance is comparable to GPT-4, dropped the API input price from 0.02 yuan/thousand tokens to 0.0005 yuan/thousand tokens, that is, 1 yuan can buy 2 million tokens, which is equivalent to the amount of text in 5 copies of the “Xinhua Dictionary”. It is the most cost-effective model in the world.

A more intuitive comparison --

Qwen-long supports long text input of 10 million tokens. Compared to GPT-4, the price is only 1/400.

The new oversized cup is also on the price reduction list: the recently released Tongyi Qianwen Super Cup QWen-Max has also dropped 67% in the API input price to as low as 0.02 yuan/thousand tokens.

In terms of open source, the input prices of the five open source models, including Qwen1.5-72B and Qwen1.5-110B, have also dropped by more than 75%, respectively.

This wave of operations has once again broken through the lowest price on the entire network. It can be said that it is an exclusive 618 carnival for big model companies and programmers.

1 yuan 2 million tokens

Let's take a look at the specific price reduction situation:

This price reduction covers a total of 9 models in the Tongyi Qianwen series. Commercial models and open source models are all listed.

including:

Qwen-Long, performance was compared to GPT-4. The API input price dropped 97% from 0.02 yuan/thousand tokens to 0.0005 yuan/thousand tokens; the API output price dropped from 0.02 yuan/thousand tokens to 0.002 yuan/thousand tokens, a decrease of 90%.

Qwen-Max's performance was on par with GPT-4-Turbo on the authoritative benchmark OpenCompass, and the API input price dropped from 0.12/1,000 tokens to 0.04 yuan/thousand tokens, a 67% drop.

In terms of the Qwen1.5 series open source models ranked in the Big Model Arena, the API input price of Qwen1.5-72b dropped by 75% from 0.02 yuan/thousand tokens to 0.005 yuan/thousand tokens; the API output price dropped from 0.02 yuan/thousand tokens to 0.01 yuan/thousand tokens, a decrease of 50%.

Compared with OpenAI's GPT series, the Tongyi Qianwen series after the price reduction is basically 10% off, which is perfect for the cost.

Take the Qwen-Long, which has the biggest drop, as an example. The price is only 1/400 of GPT-4, but it is not inferior in terms of performance indicators.

In particular, in terms of long text, Qwen-long supports ultra-long contextual conversations up to 10 million tokens, which means it can easily handle documents of about 15 million words or 15,000 pages. With the simultaneous launch of the document service, it can also support parsing and dialogue in various document formats such as word, pdf, Markdown, epub, and mobi.

It is worth noting that unlike the pricing method of most domestic manufacturers with the same input and output prices, Qwen-Long's input price this time dropped even more than the output price.

In response to this, the official Ali government also gave an explanation:

Nowadays, it has become one of the most common requirements for users to ask questions about large models in combination with long text (papers, documents, etc.), so the number of model input calls is often greater than the number of output calls.

According to statistics, the actual number of model input calls is generally about 8 times the output. We have drastically reduced the price of the input token most used by users, which is more cost-effective for enterprises and can be better inclusive.

I also hope everyone makes use of long texts.

As soon as Ali takes action, it's a big move

Speaking of which, this isn't the first time Alibaba Cloud has broken through the industry's floor price.

On February 29 of this year, Alibaba Cloud just completed a big “Crazy Thursday” cloud product: the price of all cloud products dropped by 20%, with the highest drop of 55%.

I actually cut myself a big sword.

The source of motivation for such a big effort is that Alibaba Cloud, as the largest public cloud vendor in China, has built complete AI infrastructure and infrastructure technology advantages under long-term technology accumulation and scale effects.

However, behind this sincere price reduction, it is even more evident that in the era of large-scale model application, this technological dividend is becoming one of the “killer weapons” for public cloud vendors.

At the AI infrastructure level, from the chip layer to the platform layer, Alibaba Cloud has built a highly flexible AI computing power scheduling system based on self-developed core technologies and products such as heterogeneous chip interconnection, high-performance network HPN7.0, high-performance storage CPFS, and artificial intelligence platform PAI.

For example, PAI supports cluster scalability at the level of 100,000 cards, and the linear scaling efficiency of hyperscale training is 96%. In large model training tasks, achieving the same results can save more than 50% of computing power resources, and the performance reaches the world's leading level.

In terms of inference optimization, Alibaba Cloud mainly provides three major capabilities:

First, high performance optimization. It includes system-level inference optimization techniques, as well as high-performance operators, efficient inference frameworks, and the ability to compile and optimize.

Second, adaptive tuning. With the diversification of AI applications, it is difficult for a single model to maintain optimal performance in all scenarios. Adaptive inference technology allows the model to dynamically adjust inference technology applications and computational resource selection according to the characteristics of the input data and the constraints of the computational environment.

Third, scalable deployment. The expansion and flexibility of model inference deployment resources can solve the tidal phenomenon of inference services over a certain period of time.

Earlier, Liu Weiguang, senior vice president of Alibaba Cloud Intelligence Group and president of the Public Cloud Division, also said that the technical dividends and scale effects of public clouds will bring huge cost and performance advantages.

This will promote “public cloud+API to become the mainstream method for enterprises to call big models.”

Mainstream route in the big model application era: public cloud+API

This is the core reason why Alibaba Cloud is once again pushing the big model “price war” to a climax.

Especially for small and medium-sized enterprises and startup teams, public cloud+API has always been regarded as a cost-effective choice to expand model applications:

Although the open source model is developing rapidly, and the strongest models represented by Llama 3 are considered to have performance comparable to GPT-4, private deployment still faces the problem of high costs.

Using the Qwen-72B open source model and the monthly usage of 100 million tokens as an example, calling the API directly on Alibaba Cloud Bairen requires only 600 yuan per month, while the average cost of private deployment exceeds 10,000 yuan per month.

In addition, the public cloud+API model is also easy to call multiple models and can provide enterprise-level data security. Take Alibaba Cloud as an example. Alibaba Cloud can provide enterprises with an exclusive VPC environment to achieve computing isolation, storage isolation, network isolation, and data encryption. At present, Alibaba Cloud has taken the lead and is deeply involved in the formulation of more than 10 major model security-related international and domestic technical standards.

The openness of cloud vendors can also provide developers with a richer range of models and toolchain choices. For example, in addition to Tongyi Qianwen, the Alibaba Cloud Bairen platform also supports hundreds of domestic and foreign models such as the Llama series, Baichuan, and ChatGLM, and also provides a one-stop development environment for large model applications. It can develop a large model application in 5 minutes and build an enterprise-level RAG application with 5 to 10 lines of code.

The qubit think tank mentioned in the “China AIGC Application Panorama Report” that products based on self-built vertical models and API access account for nearly 70% of AIGC application products.

This data also supports the market potential of the “public cloud+API” model from the side: in the application market, understanding the business and accumulating data are the keys to breaking the game. Using applications on the basis of public cloud+API is a more realistic choice in terms of cost and launch speed.

In fact, whether it's an intuitive price dispute or a deeper AI infrastructure volume, it all reflects that as the development focus of big models gradually shifts from basic models to implementation applications, how platform manufacturers lower the usage threshold for large models has become the key to competition.

Liu Weiguang pointed out:

As the largest cloud computing company in China, Alibaba Cloud has reduced the input price of mainstream big model APIs by 97% this time, hoping to accelerate the explosion of AI applications.

We expect the number of large model API calls to grow tens of thousands of times in the future.

To sum it up, on the one hand, for platform manufacturers, behind the “price war” is actually a dispute over infrastructure and technical capabilities; on the other hand, for the entire big model industry, whether applications can continue to explode and become more popular, entry thresholds and operating costs have become key factors.

Seen in this way, the price reduction trend that has taken place recently cannot be described as good news for developers and fat people looking forward to more big model apps.

What do you think?

edit/lambor

The translation is provided by third-party software.


The above content is for informational or educational purposes only and does not constitute any investment advice related to Futu. Although we strive to ensure the truthfulness, accuracy, and originality of all such content, we cannot guarantee it.
    Write a comment