share_log

招商证券:文生视频模型SORA表现效果超预期 带动算力网络建设需求

China Merchants Securities: Wensheng's video model SORA's performance exceeds expectations, driving demand for computing power network construction

Zhitong Finance ·  Feb 20 13:57

The Zhitong Finance App learned that China Merchants Securities released a research report saying that Sora has opened up AIGC application space in the field of vision, and the continued shortage of computing power network supply is driving demand for hardware infrastructure construction. The line estimates that training the Sora model requires about 70,900 H100 sheets per month of training. On the inference side, according to relevant research estimates, the computational power consumption of generating a graph is about 256 words. From this, it is estimated that generating a one-minute short video consumes about a thousand times more computing power than generating a text conversation. The bank believes that there will continue to be a shortage of computing power in the short to medium term and will not be able to fully meet inference side needs.

Incident: On February 16, OpenAI launched the Wensheng video model Sora, which can create realistic and imaginative scenes based on text instructions, and can generate high-definition videos of complex scenes with multiple characters, specific types of motion, and accurate details of subjects and backgrounds, and can be up to one minute long. Sora's exceeded expectations showed the effectiveness of the Transformer model in the field of vision, laying the foundation for accelerated iteration of visual models.

The views of China Merchants Securities are as follows:

The Sora model was showcased with amazing results and established a visual model milestone.

Unlike previous visual models, OpenAI's Sora is a generic model for visual data. By providing multiple frames of predictions to the model at once, it solves the challenging problem of ensuring that the subject remains the same even when temporarily out of sight. It can generate videos and images of varying length, aspect ratio, and resolution, and can output up to a minute of high-definition video. Sora's core strengths: consistency, flexibility, stability. Sora can flexibly generate images of various pixels and formats, and can also generate videos based on images or expand video content into new videos. Compared to other models, when Sora is generated for up to 1 minute, it is possible to maintain the consistency of the previous visual model before and after the previous visual model. At the same time, Sora also showed an ability to understand the laws of physics, and satisfied the rules of physics in images generated without artificial constraints, making the images more realistic.

At the time of GPT3 of the visual model, model iteration entered a period of acceleration.

Before Sora, although the big language model gradually became the main research direction with the success of GPT, the diffusion model still dominated the big language model. Widely used visual models such as DALL·E and StableDiffusion all use diffusion models. The main reason why the big language model proposed by Google in 2023 did not perform well in the video field was not that the model itself did not have a good form of expression to convert videos, which also proved the viability of the big language model in the field of Wensheng video. Sora's breakthrough is that it is based on the DIT structure and combines the common advantages of the big language model and the diffusion model. The Diffusion model can also be scaled up, proving that the powerful miracle of GTP4 can also have the same “emerging” effect in the field of vision. Sora marks the success of the diffusion-language model fusion route. It has great potential for iteration in the future, similar to GPT3's landmark. Continued iteration along this path is expected to produce more realistic visual models within the next 1-2 years.

Sora greatly boosts demand for computing power and drives investment in hardware construction.

According to rough estimates by Dr. Xie Saining, the founder of the dIT model, the parameter scale of the Sora model is about 3 billion. According to research results on the amount of data that can be trained, major overseas video sites upload about 500 hours of video content every minute. From this, the line estimates that training the Sora model requires about 70,900 H100 sheets per month of training. On the inference side, according to relevant research estimates, the computational power consumption of generating a graph is about 256 words. From this, it is estimated that generating a one-minute short video consumes about a thousand times more computing power than generating a text conversation. There will continue to be a shortage of computing power in the short to medium term and will not be able to fully meet inference side needs.

Investment advice: Sora opens up AIGC's application space in the field of vision. Continued shortage of computing power networks is driving demand for hardware infrastructure construction.

In the optical module segment, the company focuses on recommending the core suppliers of optical modules in North America: Zhongji Xuchuang (300308.SZ), Xinyisheng (300502.SZ), their upstream core supplier Tianfu Communications (300394.SZ), and Yuanjie Technology (688498.SH), a leading domestic optical chip manufacturer;

In the switch section, the bank recommended focusing on Ziguang (000938.SZ) and Ruijie Network (301165.SZ), the domestic switch chip leader, Shengke Communications (688702.SH), and also recommending domestic ICT giant ZTE (000063.SZ);

In the video codec section, the company suggests focusing on high-quality video codec companies Danghong Technology (688039.SH) and Weihai (301318.SZ).

Risk warning: Assumptions of core calculation parameters are inaccurate, implementation progress of the Sora model falls short of expectations, and the competitive landscape of the industry deteriorates

The translation is provided by third-party software.


The above content is for informational or educational purposes only and does not constitute any investment advice related to Futu. Although we strive to ensure the truthfulness, accuracy, and originality of all such content, we cannot guarantee it.
    Write a comment