share_log

6000亿参数的商汤多模态大模型发布,性能超越GPT-4 Turbo

A large Shangtang multi-modal model with 600 billion parameters was released, and the performance surpassed GPT-4 Turbo

TMTPost News ·  Apr 24 14:30

Source: Titanium Media

“We think 2024 will be the year of the big end side model explosion.” Xu Li, chairman and CEO of Shangtang Technology, said.

$SENSETIME-W (00020.HK)$The technical capabilities of large models are being upgraded at an accelerated pace.

On the afternoon of April 23, Shangtang Technology, a listed artificial intelligence (AI) company, released the “Japan-Japan SenseNova” 5.0 multi-modal large model series in Shanghai. It uses a hybrid expert (MoE) architecture, supports up to 10T tokens in Chinese, English, and training data, and synthesizes inference data up to hundreds of billions of tokens. The context window during inference can be effective up to 200K, and has comprehensive capabilities such as end-side diffusion and language models, knowledge, reasoning, mathematics, and code to fully benchmark GPT-4 Turbo.

Shangtang Technology said that this is the first “cloud, end, edge” full-stack large model product matrix in the industry to meet the application needs of different scale scenarios. The company's technology leader accelerates the full transition from generative AI to industrial implementation, and enables large models to be taken as needed.

Xu Li, Chairman and CEO of Shangtang Technology, said that under the principles of the Scaling Law (Scaling Law), Shang Tang will continue to promote its own large-scale model research and development, continue to explore the KRE three-tier architecture (knowledge - reasoning - execution) of large model capabilities, and continue to break through the boundaries of large model capabilities.

“We think 2024 will be the year of the big end side model explosion.” Xu Li said at the meeting.

商汤科技董事长兼CEO 徐立博士
Dr. Xu Li, Chairman and CEO of Shangtang Technology

In a conversation with the editor of the Titanium Media App before the conference, Wang Xiaogang, co-founder of Shangtang Technology, and president of Zhuying Smart Vehicle Business Group, said that the end-side model market has huge potential. There are 3 billion mobile phones every year, and PC shipments are also 2-3 billion units a year. AI PCs can be an assistant for all of us, and automobile intelligence is also an important development opportunity period, so that big models can be widely used.

“This is also one of Shangtang's strategic priorities this year.” Wang Xiaogang emphasized, “The power of our models today is still due to the increase in the number of model parameters, data, and performance improvements due to the increasing demand for computing power. Well, based on this, future big models will have higher and higher requirements for resource investment and software and hardware infrastructure. One result that will inevitably lead to later is that not so many companies will work on big models, and they will not form a '100 Model' or 'Thousand Mode War'.”

According to reports, in March 2023, Shangtang Group announced that the company's vision and strategic goals were transformed into “using AGI as the core strategic goal, with a view to achieving major breakthroughs in AGI technology within the next few years.”

Based on this transformation, Shangtang Technology established and implemented the development goals of AI for All, and focused on using the SenseCore Al device as the core platform for large model production, and the Shangtang AIDCAI computing center platform as AI infrastructure capabilities with large computing power, thus creating multi-modal models with general capabilities, as well as vertical industries and more specialized large models, fundamentally reducing the downstream application costs and threshold of large models.

On April 10 of last year, Shangtang first announced the “Japan-Japan New SenseNova” large model system and the self-developed Chinese language model application platform “discussion”. The number of parameters reached 100 billion, which can achieve capabilities and scenario applications such as text generation, image generation, and multi-modal content generation.

In July, August 2023, and January of this year, the large Japanese model was upgraded to version 2.0 and 3.0, and version pages for models with different parameter levels such as Nisshin V4.0, “Discussion” 2.0, and Small Model Discussion S (SenseHat S) were released simultaneously, which can meet the application requirements of different terminals and scenarios such as mobile devices. By improving the quality of training data and achieving a significant improvement in basic language ability, the new Nisshin V4.0 achieves the ability to match GPT-4 in various scenarios such as code writing, data analysis, and medical Q&A. At the same time, it is also open source two parameter-based models compatible with 7B and 20B.

In March of this year, Xu Li said that under the guidance of Scaling Law, the big model is in a golden period of technological revolution and performance improvement. Since its release in 2023, the capabilities of the Shangtang “Japanese New” big model have been significantly improved every three months, achieving Wanka Wan San's large model training capabilities, and has reached the leading level in the country in terms of base models, multi-modality, programming and tool call, million-word lossless context, and small terminal models.

According to the latest 2023 report, revenue based on Shangtang's generative AI reached 1.2 billion yuan last year, achieving a rapid increase of 200%; at the same time, the total computing power of the large Shangtang model infrastructure device, grew by a breakthrough to 12000 PetaFlops, and the operating number of GPUs reached 45,000 cards. Its computing power in China was 2000P, and 58 domestic chips have been adapted and applied; in addition, more than 70% of customers in the generative AI business were new customers in Shangtang within the past 12 months, and the customer unit price of 30% of existing customers remained It also recorded a growth rate of about 50%. As of March, the number of customers with orders over 10 million yuan had reached dozens, and the number of calls from the C-side of Japan New Energy had increased nearly 120 times.

On April 23, at the Shangtang Technology Open Day, the company officially released the Japanese New Big Model V5.0. This is also the first domestic enterprise capable of establishing a “cloud, end, edge” full-stack large model product matrix. Specifically, based on this technology open day event, the Titanium Media App sorted out the four core technology sectors of Shangtang:

1. Cloud model

Shangtang's 100 billion model SenseHat (discussion) was upgraded to 5.0 billion, reaching 600 billion parameters, supporting MoE to greatly improve creative writing ability, reasoning ability, and summary ability. After injecting the same Chinese knowledge, better understanding, summary, and question-and-answer ability can be obtained; at the same time, mathematical ability, coding ability, and reasoning ability have reached the leading level in the industry; in terms of multi-modal ability, it supports the analysis and understanding of high-definition long images and interactive generation of Wensheng diagrams to achieve complex cross-document knowledge extraction, summary question and answer display, and rich multi-modal interaction capabilities. The overall score ranking in MMBench.

In mainstream objective evaluation, the new 5.0 reached or surpassed the GPT-4 Turbo version released by OpenAI at the developer conference last year, while also almost completely crushing the recently released Llama 3-70B.

Xu Li said that in advanced reasoning, especially in mathematics, compared to GPT-4, the daily increase was more than 100%, and Llama 2 and 3 were improved by more than 400%. That is, most of the capabilities used to improve data quality are built on reasoning ability to advance synthetic data reasoning.

2. End edge model

With the rapid development of large model technology, demand for different application scenarios is becoming more and more apparent, and the application of large AI models in the field of terminal devices such as smartphones, computers, and VR glasses has also become a major trend, so the SenseChat-Lite version with a parameter scale of 1.8B (1.8 billion) was launched.

In the benchmark test, the end side model completely surpassed large models of the same magnitude as MiniCPM-2B and Phi-2, and also surpassed the 7B and 13B models.

Xu Li said that it has the best performance at the same scale and is leading across all levels. “The big model on the end side, the world's martial arts, only fast and unbreakable.”

At the same time, Shang Tang also launched a cloud collaboration solution, which leverages the respective advantages of Duanyun through intelligent judgment and collaboration. In some scenarios, end-side processing accounts for more than 80%, thereby significantly reducing inference costs. Currently, the inference speed of Shang Tang Ririxin's big language model has reached the fastest in the industry, and can achieve an average generation speed of 18.3 words/s on mid-tier platforms, and the flagship platform has reached 78.3 words/s; while Wensheng Figure, the diffusion model has an inference speed of less than 1.5 seconds, 10 times faster than the YouShang Cloud app, and supports 12 million pixel output. and The above high-definition images support fast image editing functions such as proportional image expansion, free image expansion, and rotation expansion on the terminal.

Wang Xiaogang told Titanium Media AGI that Shangtang did a better model on the end side. In the past year, when providing services to mobile phone and car customers in the cloud, the company also made extensive improvements and developments to meet the needs of these end users. The overall effect is in line with the cloud model, and it will have a particularly big advantage in terms of experience. By activating the sparse model this time, computational costs can be greatly reduced and the power consumption of the end-side model can be reduced.

3. Enterprise all-in-one

At this event, in response to enterprise-level large-scale model application needs in the finance, coding, medical, government and other industries, Shangtang officially launched an enterprise-level large-scale all-in-one computer, which can simultaneously support enterprise-level 100 billion model acceleration and knowledge retrieval hardware acceleration, enabling localized deployment, immediate purchase and use, and lowering the threshold for enterprise application of large models. Compared with similar products in the industry, inference costs are reduced by 80%, retrieval is greatly accelerated, and CPU workload is 50%.

Specifically, the Shangtang enterprise-grade large-model all-in-one computer uses a “2-42” architecture. It is a high-density all-in-one computer with the strongest performance in its class. It has a high-speed 4-card interconnection, up to 256GB of video memory, 448Gb/s of interconnection speed, and 2PFLOPS @半精度 of computing power.

Among them, the price of the lightweight version of the Little Raccoon Code Big Model all-in-one computer starts at 350,000 yuan per unit. Xu Li mentioned that this product has advantages such as cost performance, usability, safety, and low threshold.

4. Large model agents and applications

Overall, based on SenseCore and the “Japanese New” big model system based on general AI infrastructure, Shang Tang has developed a number of generative AI products, such as consultation, second drawing, like a shadow, grid, Qiongyu, Damei, and the latest Little Raccoon Family series, all of which have been updated in version 5.0.

Take the little raccoon, for example. Shang Tang's new Little Raccoon series supports end side, not only Little Raccoon with code, but also Little Office Raccoon, etc., to support more scene applications; Second Picture has been completely updated and upgraded to support more detailed terms.

Xu Li said that the code-like Little Raccoon robot released now can be end-to-end. As for writing code in natural language, it is currently not fully automated. The reason is that human natural language is naturally singular.

In addition, Shang Tang also announced new technological breakthroughs in fields such as “Wensheng Video.”

Xu Li played three videos entirely generated by Shang Tang's big model on the live, and emphasized the “Wensheng Video” platform's controllability of characters, movements, and scenes, but the “Wensheng Video” product was not officially released; for digital people, Shang Tang also released Shang Tang's anthropomorphic language model, making the virtual characters feel real.

Xu Li said that Shang Tang's team hopes to continue to drive changes in the AI industry as a whole, especially in the AI 2.0 era.

Wang Xiaogang told the Titanium Media App that each model does not exist in isolation. The Shang Tang multi-modal model is based on the language model, and the Wensheng video model draws heavily on the model from last year's Shang Tang Wensheng diagram in terms of network architecture, data production pipeline, and R&D process, so the models are also interrelated. At the same time, commercial soup has accumulated a large amount of flavor behind it.

“The most important thing about the development of big models today is not how many models have survived; the most important thing is the differentiation of models. So today, whether we talk about war or volume, it is mainly reflected in our lack of differentiation. So how do we differentiate ourselves? To innovate, there are still some problems with these R&D models today. For example, when we see OpenAI release a model and know what kind of model it can do, people flock to do similar things. When others haven't tried it or made anything, there is a high level of uncertainty and investment risk, and very few people invest in this area, so it can be seen that Shang Tang's approach to big model development is not the same as many other companies.” Wang Xiaogang said.

Wang Xiaogang also mentioned to the Titanium Media App that “Brilliant Picture” is focusing on the development of smart cars. It will not only have autonomy and greater investment, but will also include support from Shangtang's basic capabilities. However, for Shang Tang as a whole, the focus of future development is not to make physical robots, but to build the “brain” of robots.

“This is where we do our best and where we should put our value to good use.” Wang Xiaogang said.

Wang Xiaogang emphasized that the AI big model is a process of long-term investment and long-term competition. On this path, Shang Tang is still very determined to continue moving forward, and on this basis, he will work with many partners and ecosystems to support these achievements and basic abilities.

editor/tolk

The translation is provided by third-party software.


The above content is for informational or educational purposes only and does not constitute any investment advice related to Futu. Although we strive to ensure the truthfulness, accuracy, and originality of all such content, we cannot guarantee it.
    Write a comment