share_log

观点 | GPT-4震撼发布,图片/视频应用、游戏和虚拟人有望加速融合

Opinion | The shocking release of GPT-4 is expected to accelerate the integration of photo/video applications, games and virtual people

中信建投證券研究 ·  Mar 17, 2023 09:34

Source: CITIC Construction Investment Securities Research

GPT-4 has significantly improved comprehension ability, comprehensive understanding of images and text, and customized personality. As far as the application field is concerned, we can already see the possibility that multi-modal models can help applications increase revenue, reduce costs and increase efficiency at the same time. We have previously compared the present to the eve of the mobile internet explosion, and we expect GPT-4 to accelerate this process.

Among them, we believe that “multi-modal+picture/video applications” are the foundation for application development. “+ games” will increase revenue by looking at improved demand. At the same time, they reduce R&D costs for large-scale games and reduce marketing expenses for small and medium-sized games. “+ virtual people” will solve “false demand” problems such as industry development being limited by parody.

OpenAI officially released GPT-4 on March 15. According to OpenAI, GPT-4 is a multi-modal model that can understand text and images and feed back text. Its comprehension ability is better than GPT-3 and ChatGPT. Currently, GPT-4's text input and feedback functions have been updated to ChatGPT and an API interface has been opened, and the image input function will cooperate with Be My Eyes. According to the official website of Be My Eyes, the Virtual Volunteer function will be combined with GPT-4. The iOS and Android apps have already enabled this function to make reservations.

According to the official website of OpenAI, compared to ChatGPT and GPT-3, GPT-4 mainly has major improvements in the following 6 areas:

1) GPT-4's comprehension ability has been greatly optimized, and we expect it to significantly improve the user experience in productivity scenarios such as office.According to the official website of OpenAI, GPT-4 with visual ability can obtain better results in most of the simulated AP, SAT, GRE, and US law exams. Of the 26 mock exams, GPT-4 achieved better results in 17. In particular, in science fields such as calculus, chemistry, and physics, the ranking improved by nearly 40%. According to The Verge, previously, ChatGPT often made mistakes during mathematical estimation. Judging from the results shown by OpenAI this time, the ability to reason in mathematical logic has improved markedly. Furthermore, the biggest increase in rankings was in the US Bar Examination. GPT-3.5 ranked only in the bottom 10%, while GPT-4.0 can reach the top 10%.

2) The multi-modal model can comprehensively understand text and images to optimize feedback, which we anticipate will be more helpful in improving the user experience in the field of education.GPT-4's multi-modal model can extract tags from images and text, process them with unified data, and give text feedback. Therefore, in the OpenAI test, GPT-4 can understand that the iPhone's data cable in the funny picture is very unreasonable. We believe that the ability to comprehensively understand images and text can optimize the interactive scene experience. For example, in the education scenario, the previous simple text/language interaction evolved into a combination of visual and language understanding, giving better feedback, and is expected to enrich the forms of education, thereby improving the quality of education.

3) GPT-4 performs better in non-English scenarios.OpenAI used Azure Translate to translate 14,000 multiple choice questions from 57 subjects into 26 languages and gave them GPT-4 tests. The results showed that the accuracy rate of GPT-4 in 24 of these languages was higher than the English test performance of LLMs such as GPT-3.5, Chinchilla, and Google's Palm, including small languages where resources are expected to be scarce, such as Latvian, Welsh, and Swahili. From another level, it can be seen that GPT-4's ability to understand language is also due to other LLMs.

4) GPT-4's “maneuverability” (steerability) will give AI a different personality and is expected to further promote the possibility of virtual humans becoming “people.”Compared to ChatGPT's fixed language style, GPT-4 will allow users connected to the API to customize the “personality” of AI. We expect to further optimize the feedback mechanism for virtual people. Similar to the domestic AI conversation application Glow, which allows users to talk to virtual people with different backgrounds and scenarios such as “Iron Man” Tony Stark, brings relevant technology into the virtual person scene, and makes the virtual person a real “person.”

Therefore, we think that ChatGPT frees virtual people from being caught. Getting an AI feedback mechanism and becoming a “person” is the first step, while GPT-4 unlocks the second part of virtual human development, making them “people” with very different personalities. This helps solve the problem that conversation and interaction with virtual people is a “false demand” due to virtual human development limited by mannerism, personality, etc.

5) In terms of experience, GPT-4 has a better sense of prevention in terms of safety, morality, law, etc.The developers of OpenAI optimized the model based on harmful information and inductive questions continuously raised by users after opening up, so now GPT-4 has a stronger sense of prevention in terms of safety, morality, and law.

6) GPT-4 allows users to type longer content.Compared to GPT-3.5 and ChatGPT's limit of about 4,096 tokens/about 8,000 words, GPT-4 allows users to enter up to 32,768 tokens/64,000 words, 8 times that of the past. As a result, GPT-4 can have more rounds of conversations with users on a more continuous basis without quickly “forgetting” the content of previous conversations.

For example, a generative AI startup supported by Y Combinator is an example. Most applications are mainly text-based input and output applications, including customer service, office assistance, technology finance, etc., followed by disguised applications for text-generating images, such as generating short videos of different art styles (ephemeral art illustration collage), game 3D models and material generation, etc.

With the release of multi-modal GPT-4 this time, we believe that on the one hand, in interactive applications such as productivity tools, education, and customer service, we can see that GPT-4's auxiliary capabilities have been further improved and the user experience of existing landing scenarios has been optimized; on the other hand, we have also seen the possibility of a multi-modal model. This upgrade is on the input side, upgrading text understanding to comprehensive understanding of text and images, and in the future, we can also look forward to the output of text combined with images, videos, etc., to promote the output of photo/video applications, games, etc. Application scenarios such as virtual people, etc. have implemented richer functions.

We believe that “multi-modal+picture/video application” is the foundation of the application field, improving production efficiency and reducing costs.Currently, the forms of AIGC technology fusion applications that exist are still relatively uniform; most of them are still disguised applications for text generation images. However, the multi-modal model makes it possible to comprehensively understand various content forms such as text, images, and video, as well as the combined output of various contents. Ultimately, it can not only provide entertainment and production tools for everyday life in C-side scenarios, but also auxiliary tools in the production of content such as games and virtual people. Therefore, we believe that “multi-modal+picture/video application” is the foundation for implementation in the application field.

“Multi-modal+game”: 1) Increase industry demand: double the sense of interactivity and solve the pain points of slowing industry demand.After experiencing a brief increase in demand at the beginning of the pandemic, demand in the market as a whole was weak. According to the game industry report, the actual sales revenue of the Chinese game market in '22 was 265.88 billion yuan, a year-on-year decrease of 10.3% and a decrease of 30.63 billion yuan. The application of the multi-modal AIGC model is expected to enhance the interactive experience of the game. For example, NetEase has applied AIGC technology to NPCs in “Against the Cold Water” to enhance players' interactive experience. In the future, we expect AIGC to change the game's fixed story pattern, increase the amount of game content, enhance the game's interactive experience, and ultimately improve the slow growth in game demand through technology.

2) Cost reduction: large games reduce R&D costs, and small and medium games reduce marketing costs.In addition to increasing revenue, multi-mode can also create games with higher content through lower production costs. Similar to the TechCrunch report, the University of Copenhagen team applied AIGC technology to the “Super Mario” game to generate MarioGPT with unlimited levels, which can reduce R&D costs for large-scale games.

Small to medium games account for a limited share of development costs, and this logic is similar to that of an advertising marketing company. Based on the content that users watch on platforms such as Weibo and Douyin, as well as external information such as weather and geographical location, “one thousand people, one thousand faces” advertising content is generated, ultimately increasing the ROI of the advertisement. Therefore, the multi-modal model can reduce the production cost of advertising materials and improve advertising effectiveness, thereby making it a source for small and medium-sized games.

“Multi-modal+virtual person”: Become a real “person” and solve the pain points of industry development.Since virtual people currently have problems such as hypocrisy or AI-generated virtual people have a single personality, it has led to the question of whether virtual people have “pseudo-requirements.” As can be seen from the release of GPT-4 this time, AI can already have individuality. At the same time, multiple modes can combine text/language and picture/visual understanding to better enable virtual people to understand real human feelings and provide feedback, enhance the sense of interactive experience, and solve pain points in industry development.

We believe that the multi-modal GPT-4 provides more possibilities for image/video applications, games, and virtual people to implement AIGC technology in development, which will help increase revenue, reduce costs and increase efficiency at the same time, and ultimately improve the valuation flexibility of the industry and individual stocks.

Risk warning:

The development of generative AI technology falls short of expectations, the risk of technology integration in various fields falls short of expectations, the degree of computing power support falls short of expectations, data quality and quantity support falls short of expectations, user demand falls short of expectations, technology monopoly risk, risk of bias in the original training data, risk of algorithm bias and discrimination, risk of algorithm transparency, increased regulatory difficulty risk, policy supervision risk, commercialization capability falls short of expectations, and the improvement of relevant laws and regulations falls short of expectations, copyright ownership risk, risk of deep fraud, human rights risks affecting the ecological health and security of Internet content, enterprise identification and governance risks The risk of lack of competency and the risk of changes in the user's aesthetic orientation.

Editor/Somer

The translation is provided by third-party software.


The above content is for informational or educational purposes only and does not constitute any investment advice related to Futu. Although we strive to ensure the truthfulness, accuracy, and originality of all such content, we cannot guarantee it.
    Write a comment