share_log

观点 | GPT-4发布,持续逼近通用人工智能AGI

Opinion | GPT-4 released, continuing to approach general artificial intelligence AGI

中信證券研究 ·  Mar 16, 2023 10:43

Source: CITIC Securities Research

Recently, OpenAI released the multi-modal big language model GPT-4 using a closed-source model. This model uses the same technical route as GPT-3.5/ChatGPT, but brings better creativity, collaboration, reasoning ability, security, etc. At the same time, a customized supercomputer was used to carry computing power during the training process, and the basic theory of large-scale language model scaling law was improved to achieve controllable training resources.

We judge that the good results of GPT-4 are expected to drive the global AI technology stack to continue to converge towards the LLM (Big Language Model) model, and continuously accelerate the development of the AI industry through the combination of violent aesthetics+engineering techniques, bringing more application scenarios to fruition, while also helping humans keep getting closer to general artificial intelligence AGI.

We continue to be optimistic about industry investment opportunities in the field of OpenAI and global AI, and suggest continuing to focus on core aspects such as chips, computing power facilities, model architecture & engineering practices, and application scenarios.

The origin of the report: OpenAI officially released GPT-4

On the evening of March 14, 2023, Beijing time, OpenAI released the official version of GPT-4, which replaced the GPT-3.5 version previously used by ChatGPT, and began providing services to paid Plus users. OpenAI said on its official website that although GPT-4 is not as capable as humans in most real-world scenarios, its performance on some professional issues and academic benchmarks is already on par with humans. Based on a detailed analysis of GPT-4's underlying technical logic and implementation functions, this report will explore the possible technological path impact of GPT-4 on the global AI industry, as well as changes and opportunities at the industry level.

GPT-4: A fully closed source model released, a multi-modal big language model that took 6 months to iteratively adjust

The format in which OpenAI launched GPT-4 this time is different from previous model releases. OpenAI neither publicly published a paper on GPT-4, nor did it provide detailed framework instructions; it only provided a 98-page technical document (which mainly describes the model's capabilities and related evaluation scores, with almost no technical details). Through this method, OpenAI blocks all direct paths referred to by learners (model size, data set construction, training methods, etc.) and sticks to the closed-source path to the end. This is also in line with our previous judgment on the future development of the industry: leading leading companies (OpenAI, Google) will stick to the closed source route to prevent other companies from reproducing their models; companies that are one or two positions behind (Meta, Amazon, NVIDIA, etc.) may choose the open source route, hoping to accelerate iteration through the power of the community.

According to this technical report published by OpenAI, GPT-4 training and iteration took more than 6 months, more than double the previously published ChatGPT. The technical path followed the autoregressive Transformer model plus human feedback to reinforce learning. The biggest improvement in model capabilities is the introduction of multi-modal processing capabilities. In addition to the text previously supported by ChatGPT, GPT-4 can also accept image input, but it is not yet open to users for use. Furthermore, the reliability of the model and the safety of the output in the face of complex tasks have been significantly improved.

Key points of model training: Using customized supercomputers to improve the basic theory of big language model scaling laws

Although OpenAI did not publish the model and specific training details, starting from its presentation in the technical documentation, we found two key points that may affect the entire industry:

1) OpenAI began collaborating with Microsoft to rebuild a supercomputer for big language model training last year, and this GPT-4 training and iteration session should have been completed entirely through this computer. According to Bloomberg's related report, the computer built by OpenAI and Microsoft cost hundreds of millions of dollars in total and used nearly 10,000 Nvidia A100 video cards. This is also consistent with our previous report on calculating the size of the graphics cards needed to train big language models. Judging from the results described by OpenAI in the technical documentation (training+iteration took a total of 6 months), the GPT-4 training process was far shorter than previously anticipated (starting from the previous paper, it took several months for the iteration part of the model of this size), which also shows the need to build a dedicated supercomputer. We believe that in the coming months, we'll see more AI giants emulate OpenAI's approach and put customized supercomputers on their agenda.

2) Scaling Law is a paper published by the OpenAI team in 2020. It mainly estimates the relationship between model ability, model size, and training time, and has also become an important theory in big language model research. And in this OpenAI technical report, we see that scaling laws have been further improved. OpenAI said that the Scaling Law was further refined during the development of GPT-4, and that it is possible to better predict the emergence of previously unexplained capabilities (when the model size of the model breaks through a certain stage, some kind of new ability will suddenly appear). The improvement of the Scaling Law also means that the investment of resources in model training will be more controllable, and AI vendors will no longer need to simply expand parameters for emerging capabilities, which will further reduce the cost of the AI training phase.

Application scenario: Multi-modal capability accelerates innovation in multiple fields

The most intuitive change in GPT-4 compared to ChatGPT is the addition of the ability to support multiple modes of image input. Although OpenAI says that currently the focus of multi-modal capabilities is still image to text, and there is no support for audio, video, image editing, etc., this also gives the market plenty of room for imagination.

1) Search field: Combining multiple modes of image input will better serve the current model assisted by traditional search engines+big language models.

2) Intelligent customer service: The combined image and text input model is more in line with some of the pain points currently encountered by ToC smart customer service.

3) The fine-tuning model for small to medium model companies is applied to specific segments: GPT-4 is described as a general big language model. Judging from the OpenAI documentation, they have no interest in fine-tuning specific segments for better results, so in the future, this will naturally be handed over to small to medium artificial intelligence vendors: fine-tuning industry segments based on GPT-4 to obtain better results.

Risk Factors:

The development of AI core technology falls short of anticipated risks; the risk of continued policy regulation in the technology sector; the risk of global macroeconomic recovery falling short of expected risks; macroeconomic fluctuations caused IT spending by European and American enterprises to fall short of expected risks; the development of the global cloud computing market fell short of expected risks; enterprise data breaches and information security risks; risks of industry competition continuing to increase risks, etc.

Investment Strategy:

The GPT-4 uses the same technical path as GPT-3.5/ChatGPT, but brings better creativity, collaboration, reasoning ability, multi-modal ability, etc. The good results of GPT-4 are expected to drive the technology stack in the field of AI to continue to converge towards LLM models, continuously accelerate the development of the AI industry through a combination of violent aesthetics and engineering techniques, and help humans keep getting closer to general artificial intelligence AGI. We continue to be optimistic about industry investment opportunities in the OpenAI and AI fields, and suggest continuing to focus on core aspects such as chips, computing power facilities, model architecture & engineering practices, and application scenarios.

Editor/Somer

The translation is provided by third-party software.


The above content is for informational or educational purposes only and does not constitute any investment advice related to Futu. Although we strive to ensure the truthfulness, accuracy, and originality of all such content, we cannot guarantee it.
    Write a comment