Track the latest AI trends

Topic 563 news 14693 Subscribers

Breaking through the bottom price of the entire network, Tongyi Qianwen GPT-4 big model dropped 97%! 1 dollar can buy 2 million tokens

量子位 · May 21 16:07

来源：量子位

通义千问GPT-4级大模型，直接击穿全网底价！

就在刚刚，阿里突然放出大招，官宣9款通义大模型降价。

其中，性能对标GPT-4的主力模型Qwen-Long，API输入价格从0.02元/千tokens降至0.0005元/千tokens，也就是1块钱能买200万tokens，相当于5本《新华字典》的文字量，堪称全球大模型的性价比之王。

更直观一点对比——

Qwen-Long支持1000万tokens的长文本输入，对比GPT-4，价格仅为1/400。

超大杯新品，此番也在降价名单中：刚发布不久的通义千问超大杯Qwen-max，API输入价格也直降67%，低至0.02元/千tokens。

开源方面，Qwen1.5-72B、Qwen1.5-110B等5款开源模型的输入价格也分别直降75%以上。

这波操作，再次击穿全网最低价，可以说是专属大模型企业、程序员的618狂欢了。

1块钱200万token

来看具体降价情况：

本次降价，共覆盖9款通义千问系列模型，商业化模型、开源模型全都在列。

包括：

Qwen-Long，性能对标GPT-4，API输入价格从0.02元/千tokens降至0.0005元/千tokens，降幅97%；API输出价格从0.02元/千tokens降至0.002元/千tokens，降幅90%。

Qwen-max，在权威基准OpenCompass上性能追平GPT-4-turbo，API输入价格从0.12/千tokens降至0.04元/千tokens，降幅67%。

上榜大模型竞技场排名的Qwen1.5系列开源模型方面，Qwen1.5-72B的API输入价格从0.02元/千tokens降至0.005元/千tokens，降幅75%；API输出价格从0.02元/千tokens降至0.01元/千tokens，降幅50%。

与OpenAI的GPT系列相比，降价后的通义千问系列，基本上都是1折购，性价比拉满。

以降幅最大的Qwen-Long为例，价格仅为GPT-4的1/400，性能指标上却并不逊色。

尤其在长文本方面，Qwen-Long支持最长1000万tokens的超长上下文对话，也就是能轻松处理约1500万字或1.5万页的文档。配合同步上线的文档服务，还可支持word、pdf、Markdown、epub、mobi等多种文档格式的解析和对话。

值得关注的是，不同于国内大部分厂商输入输出价格相同的定价方式，这次Qwen-Long的输入价格比输出价格降幅更大。

对此，阿里官方也给出了解释：

现在，用户结合长文本（论文、文档等）对大模型提问已经成为最常见的需求之一，所以模型输入调用量往往大于输出调用量。

根据统计，真实的模型输入调用量一般是输出的8倍左右。我们把用户使用量最大的输入token价格大幅降下去，对企业来说更划算，可以更好地做到普惠。
也是希望大家把长文本用起来。

阿里一出手就是大招

说起来，这已经不是阿里云第一次击穿行业底价。

就在今年的2月29日，阿里云刚刚整过一个云产品“疯狂星期四”的大活儿：全线云产品价格直降20%，最高降幅达55%。

属实是砍自己一大刀了。

如此大手笔，底气来源是阿里云作为国内第一大公有云厂商，在长期技术积累和规模效应下，已经构建起完备的AI基础设施和Infra技术优势。

而此番诚意满满的降价，背后更是显露出大模型应用时代，这种技术红利正在成为公有云厂商的“杀手锏”之一。

在AI基础设施层面，从芯片层到平台层，阿里云已经基于自研的异构芯片互联、高性能网络HPN7.0、高性能存储CPFS、人工智能平台PAI等核心技术和产品，构建起了高弹性的AI算力调度系统。

举个例子，PAI支持10万卡量级的集群可扩展规模，超大规模训练线性拓展效率达96%。在大模型训练任务中，达到同样效果可节省超50%算力资源，性能达到全球领先水平。

推理优化方面，阿里云主要提供了三大能力：

其一，高性能优化。包括系统级的推理优化技术，以及高性能算子、高效推理框架、编译优化的能力。

其二，自适应调优。随着AI应用的多样化，一个单一的模型很难在所有场景中都保持最佳效能，自适应推理技术允许模型根据输入数据的特性和计算环境的约束，动态调整推理技术应用和计算资源选型。

其三，可扩展部署。模型推理部署资源的扩展和弹性，能解决推理服务在一定时期内的潮汐现象。

此前，阿里云智能集团资深副总裁、公共云事业部总裁刘伟光也表示，公有云的技术红利和规模效应，会带来巨大的成本和性能优势。

这将促使“公有云+API成为企业调用大模型的主流方式”。

大模型应用时代主流路线：公有云+API

这也正是阿里云把大模型“价格战”再度推向高潮的核心原因。

尤其对于中小企业、创业团队而言，公有云+API一直以来被视作做大模型应用的性价比之选：

尽管开源模型发展势头迅猛，以Llama 3为代表的最强模型们更被认为已经有媲美GPT-4的表现，但私有化部署仍然面临着成本高昂的问题。

以使用Qwen-72B开源模型、每月1亿token用量为例，在阿里云百炼上直接调用API，每月仅需600元，私有化部署成本则平均每月超10000元。

除此之外，公有云+API模式还便于多模型调用，能提供企业级的数据安全保障。以阿里云为例，阿里云可以为企业提供专属VPC环境，做到计算隔离、存储隔离、网络隔离、数据加密。目前，阿里云已主导、深度参与10多项大模型安全相关国际国内技术标准的制定。

云厂商的开放性，还能为开发者提供更丰富的模型和工具链选择。比如，阿里云百炼平台在通义千问之外，还支持Llama系列、百川、ChatGLM等上百款国内外大模型，同时提供大模型应用一站式开发环境，可以做到5分钟开发一款大模型应用、5到10行代码即可搭建企业级RAG应用。

量子位智库在《中国AIGC应用全景报告》中提到，AIGC应用产品中，基于自建垂直大模型和API接入的产品占到将近7成。

这一数据同样从侧面佐证了“公有云+API”模式的市场潜力：在应用市场，对业务的理解和数据积累才是破局关键，在公有云+API的基础上做应用，在成本和启动速度方面都是更现实的选择。

实际上，无论是直观的价格之争，还是更深层次的AI基础设施之卷，反映出的都是，当大模型发展焦点逐步从基础模型迈向落地应用，平台厂商如何降低大模型的使用门槛，已经成为竞争的关键所在。

刘伟光指出：

作为中国第一大云计算公司，阿里云这次将主流大模型API输入价格降低97%，就是希望加速AI应用的爆发。
我们预计未来大模型API的调用量会有成千上万倍的增长。

总结起来就是，一方面，对于平台厂商而言，“价格战”背后其实是基础设施、技术能力之争；另一方面，对于整个大模型行业而言，应用是否能持续爆发、进一步普及，入局门槛、运营成本已成关键因素。

如此看来，近来卷起的降价趋势，对于开发者和期待更多大模型应用的胖友们而言，不可谓不是利好消息。

你觉得呢？

编辑/lambor

Source: Quantum Bits

Tongyi Qianwen GPT-4 big model directly breaks through the bottom price of the entire network!

Just now, Ali suddenly made a big move and officially announced price cuts for 9 big models of Tongyi.

Among them, Qwen-Long, the main model whose performance is comparable to GPT-4, dropped the API input price from 0.02 yuan/thousand tokens to 0.0005 yuan/thousand tokens, that is, 1 yuan can buy 2 million tokens, which is equivalent to the amount of text in 5 copies of the “Xinhua Dictionary”. It is the most cost-effective model in the world.

A more intuitive comparison --

Qwen-long supports long text input of 10 million tokens. Compared to GPT-4, the price is only 1/400.

The new oversized cup is also on the price reduction list: the recently released Tongyi Qianwen Super Cup QWen-Max has also dropped 67% in the API input price to as low as 0.02 yuan/thousand tokens.

In terms of open source, the input prices of the five open source models, including Qwen1.5-72B and Qwen1.5-110B, have also dropped by more than 75%, respectively.

This wave of operations has once again broken through the lowest price on the entire network. It can be said that it is an exclusive 618 carnival for big model companies and programmers.

1 yuan 2 million tokens

Let's take a look at the specific price reduction situation:

This price reduction covers a total of 9 models in the Tongyi Qianwen series. Commercial models and open source models are all listed.

including:

Qwen-Long, performance was compared to GPT-4. The API input price dropped 97% from 0.02 yuan/thousand tokens to 0.0005 yuan/thousand tokens; the API output price dropped from 0.02 yuan/thousand tokens to 0.002 yuan/thousand tokens, a decrease of 90%.

Qwen-Max's performance was on par with GPT-4-Turbo on the authoritative benchmark OpenCompass, and the API input price dropped from 0.12/1,000 tokens to 0.04 yuan/thousand tokens, a 67% drop.

In terms of the Qwen1.5 series open source models ranked in the Big Model Arena, the API input price of Qwen1.5-72b dropped by 75% from 0.02 yuan/thousand tokens to 0.005 yuan/thousand tokens; the API output price dropped from 0.02 yuan/thousand tokens to 0.01 yuan/thousand tokens, a decrease of 50%.

Compared with OpenAI's GPT series, the Tongyi Qianwen series after the price reduction is basically 10% off, which is perfect for the cost.

Take the Qwen-Long, which has the biggest drop, as an example. The price is only 1/400 of GPT-4, but it is not inferior in terms of performance indicators.

In particular, in terms of long text, Qwen-long supports ultra-long contextual conversations up to 10 million tokens, which means it can easily handle documents of about 15 million words or 15,000 pages. With the simultaneous launch of the document service, it can also support parsing and dialogue in various document formats such as word, pdf, Markdown, epub, and mobi.

It is worth noting that unlike the pricing method of most domestic manufacturers with the same input and output prices, Qwen-Long's input price this time dropped even more than the output price.

In response to this, the official Ali government also gave an explanation:

Nowadays, it has become one of the most common requirements for users to ask questions about large models in combination with long text (papers, documents, etc.), so the number of model input calls is often greater than the number of output calls.

According to statistics, the actual number of model input calls is generally about 8 times the output. We have drastically reduced the price of the input token most used by users, which is more cost-effective for enterprises and can be better inclusive.
I also hope everyone makes use of long texts.

As soon as Ali takes action, it's a big move

Speaking of which, this isn't the first time Alibaba Cloud has broken through the industry's floor price.

On February 29 of this year, Alibaba Cloud just completed a big “Crazy Thursday” cloud product: the price of all cloud products dropped by 20%, with the highest drop of 55%.

I actually cut myself a big sword.

The source of motivation for such a big effort is that Alibaba Cloud, as the largest public cloud vendor in China, has built complete AI infrastructure and infrastructure technology advantages under long-term technology accumulation and scale effects.

However, behind this sincere price reduction, it is even more evident that in the era of large-scale model application, this technological dividend is becoming one of the “killer weapons” for public cloud vendors.

At the AI infrastructure level, from the chip layer to the platform layer, Alibaba Cloud has built a highly flexible AI computing power scheduling system based on self-developed core technologies and products such as heterogeneous chip interconnection, high-performance network HPN7.0, high-performance storage CPFS, and artificial intelligence platform PAI.

For example, PAI supports cluster scalability at the level of 100,000 cards, and the linear scaling efficiency of hyperscale training is 96%. In large model training tasks, achieving the same results can save more than 50% of computing power resources, and the performance reaches the world's leading level.

In terms of inference optimization, Alibaba Cloud mainly provides three major capabilities:

First, high performance optimization. It includes system-level inference optimization techniques, as well as high-performance operators, efficient inference frameworks, and the ability to compile and optimize.

Second, adaptive tuning. With the diversification of AI applications, it is difficult for a single model to maintain optimal performance in all scenarios. Adaptive inference technology allows the model to dynamically adjust inference technology applications and computational resource selection according to the characteristics of the input data and the constraints of the computational environment.

Third, scalable deployment. The expansion and flexibility of model inference deployment resources can solve the tidal phenomenon of inference services over a certain period of time.

Earlier, Liu Weiguang, senior vice president of Alibaba Cloud Intelligence Group and president of the Public Cloud Division, also said that the technical dividends and scale effects of public clouds will bring huge cost and performance advantages.

This will promote “public cloud+API to become the mainstream method for enterprises to call big models.”

Mainstream route in the big model application era: public cloud+API

This is the core reason why Alibaba Cloud is once again pushing the big model “price war” to a climax.

Especially for small and medium-sized enterprises and startup teams, public cloud+API has always been regarded as a cost-effective choice to expand model applications:

Although the open source model is developing rapidly, and the strongest models represented by Llama 3 are considered to have performance comparable to GPT-4, private deployment still faces the problem of high costs.

Using the Qwen-72B open source model and the monthly usage of 100 million tokens as an example, calling the API directly on Alibaba Cloud Bairen requires only 600 yuan per month, while the average cost of private deployment exceeds 10,000 yuan per month.

In addition, the public cloud+API model is also easy to call multiple models and can provide enterprise-level data security. Take Alibaba Cloud as an example. Alibaba Cloud can provide enterprises with an exclusive VPC environment to achieve computing isolation, storage isolation, network isolation, and data encryption. At present, Alibaba Cloud has taken the lead and is deeply involved in the formulation of more than 10 major model security-related international and domestic technical standards.

The openness of cloud vendors can also provide developers with a richer range of models and toolchain choices. For example, in addition to Tongyi Qianwen, the Alibaba Cloud Bairen platform also supports hundreds of domestic and foreign models such as the Llama series, Baichuan, and ChatGLM, and also provides a one-stop development environment for large model applications. It can develop a large model application in 5 minutes and build an enterprise-level RAG application with 5 to 10 lines of code.

The qubit think tank mentioned in the “China AIGC Application Panorama Report” that products based on self-built vertical models and API access account for nearly 70% of AIGC application products.

This data also supports the market potential of the “public cloud+API” model from the side: in the application market, understanding the business and accumulating data are the keys to breaking the game. Using applications on the basis of public cloud+API is a more realistic choice in terms of cost and launch speed.

In fact, whether it's an intuitive price dispute or a deeper AI infrastructure volume, it all reflects that as the development focus of big models gradually shifts from basic models to implementation applications, how platform manufacturers lower the usage threshold for large models has become the key to competition.

Liu Weiguang pointed out:

As the largest cloud computing company in China, Alibaba Cloud has reduced the input price of mainstream big model APIs by 97% this time, hoping to accelerate the explosion of AI applications.
We expect the number of large model API calls to grow tens of thousands of times in the future.

To sum it up, on the one hand, for platform manufacturers, behind the “price war” is actually a dispute over infrastructure and technical capabilities; on the other hand, for the entire big model industry, whether applications can continue to explode and become more popular, entry thresholds and operating costs have become key factors.

Seen in this way, the price reduction trend that has taken place recently cannot be described as good news for developers and fat people looking forward to more big model apps.

What do you think?

edit/lambor

The translation is provided by third-party software.

The above content is for informational or educational purposes only and does not constitute any investment advice related to Futu. Although we strive to ensure the truthfulness, accuracy, and originality of all such content, we cannot guarantee it.