近日openAI采用闭源模式发布多模态大语言模型GPT-4 ，该模型采用和GPT-3.5/ChatGPT相同的技术路线，但带来了更好的创造性、协作性、推理能力、安全性等，同时训练过程中采用定制超级计算机进行算力承载，并完善了大语言模型scaling law基础理论，实现训练资源可控。
2）Scaling Law是OpenAI团队在2020年发表的论文，主要对模型能力与模型大小、训练时长间的关系做了推算，也成为了大语言模型研究的重要理论。而在本次OpenAI的技术报告中，我们看到对scaling law有了进一步的完善。OpenAI表示在开发GPT-4的过程中进一步完善了Scaling Law，对此前无法解释的涌现能力（当模型体积大小突破到某一阶段时会突然出现某种新能力）可以更好的预测。Scaling Law的完善也意味着在模型训练资源的投入将会更加可控，AI厂商将不再需要为了涌现能力一味扩大参数，这将进一步降低AI训练阶段的成本。
GPT-4 采用与 GPT-3.5/ChatGPT 相同的技术路线，但带来了更好的创造性、协作性、推理能力，以及多模态能力等。GPT-4的良好效果，有望推动AI领域技术栈持续向LLM模型收敛，并通过暴力美学+工程技巧的结合不断加速AI产业发展，帮助人类不断逼近通用人工智能AGI。我们持续看好openAI及AI领域的产业投资机会，并建议持续聚焦芯片、算力设施、模型架构&工程实践、应用场景等核心环节。
Source: CITIC Securities Research
Recently, OpenAI released the multi-modal big language model GPT-4 using a closed-source model. This model uses the same technical route as GPT-3.5/ChatGPT, but brings better creativity, collaboration, reasoning ability, security, etc. At the same time, a customized supercomputer was used to carry computing power during the training process, and the basic theory of large-scale language model scaling law was improved to achieve controllable training resources.
We judge that the good results of GPT-4 are expected to drive the global AI technology stack to continue to converge towards the LLM (Big Language Model) model, and continuously accelerate the development of the AI industry through the combination of violent aesthetics+engineering techniques, bringing more application scenarios to fruition, while also helping humans keep getting closer to general artificial intelligence AGI.
We continue to be optimistic about industry investment opportunities in the field of OpenAI and global AI, and suggest continuing to focus on core aspects such as chips, computing power facilities, model architecture & engineering practices, and application scenarios.
The origin of the report: OpenAI officially released GPT-4
On the evening of March 14, 2023, Beijing time, OpenAI released the official version of GPT-4, which replaced the GPT-3.5 version previously used by ChatGPT, and began providing services to paid Plus users. OpenAI said on its official website that although GPT-4 is not as capable as humans in most real-world scenarios, its performance on some professional issues and academic benchmarks is already on par with humans. Based on a detailed analysis of GPT-4's underlying technical logic and implementation functions, this report will explore the possible technological path impact of GPT-4 on the global AI industry, as well as changes and opportunities at the industry level.
GPT-4: A fully closed source model released, a multi-modal big language model that took 6 months to iteratively adjust
The format in which OpenAI launched GPT-4 this time is different from previous model releases. OpenAI neither publicly published a paper on GPT-4, nor did it provide detailed framework instructions; it only provided a 98-page technical document (which mainly describes the model's capabilities and related evaluation scores, with almost no technical details). Through this method, OpenAI blocks all direct paths referred to by learners (model size, data set construction, training methods, etc.) and sticks to the closed-source path to the end. This is also in line with our previous judgment on the future development of the industry: leading leading companies (OpenAI, Google) will stick to the closed source route to prevent other companies from reproducing their models; companies that are one or two positions behind (Meta, Amazon, NVIDIA, etc.) may choose the open source route, hoping to accelerate iteration through the power of the community.
According to this technical report published by OpenAI, GPT-4 training and iteration took more than 6 months, more than double the previously published ChatGPT. The technical path followed the autoregressive Transformer model plus human feedback to reinforce learning. The biggest improvement in model capabilities is the introduction of multi-modal processing capabilities. In addition to the text previously supported by ChatGPT, GPT-4 can also accept image input, but it is not yet open to users for use. Furthermore, the reliability of the model and the safety of the output in the face of complex tasks have been significantly improved.
Key points of model training: Using customized supercomputers to improve the basic theory of big language model scaling laws
Although OpenAI did not publish the model and specific training details, starting from its presentation in the technical documentation, we found two key points that may affect the entire industry:
1) OpenAI began collaborating with Microsoft to rebuild a supercomputer for big language model training last year, and this GPT-4 training and iteration session should have been completed entirely through this computer. According to Bloomberg's related report, the computer built by OpenAI and Microsoft cost hundreds of millions of dollars in total and used nearly 10,000 Nvidia A100 video cards. This is also consistent with our previous report on calculating the size of the graphics cards needed to train big language models. Judging from the results described by OpenAI in the technical documentation (training+iteration took a total of 6 months), the GPT-4 training process was far shorter than previously anticipated (starting from the previous paper, it took several months for the iteration part of the model of this size), which also shows the need to build a dedicated supercomputer. We believe that in the coming months, we'll see more AI giants emulate OpenAI's approach and put customized supercomputers on their agenda.
2) Scaling Law is a paper published by the OpenAI team in 2020. It mainly estimates the relationship between model ability, model size, and training time, and has also become an important theory in big language model research. And in this OpenAI technical report, we see that scaling laws have been further improved. OpenAI said that the Scaling Law was further refined during the development of GPT-4, and that it is possible to better predict the emergence of previously unexplained capabilities (when the model size of the model breaks through a certain stage, some kind of new ability will suddenly appear). The improvement of the Scaling Law also means that the investment of resources in model training will be more controllable, and AI vendors will no longer need to simply expand parameters for emerging capabilities, which will further reduce the cost of the AI training phase.
Application scenario: Multi-modal capability accelerates innovation in multiple fields
The most intuitive change in GPT-4 compared to ChatGPT is the addition of the ability to support multiple modes of image input. Although OpenAI says that currently the focus of multi-modal capabilities is still image to text, and there is no support for audio, video, image editing, etc., this also gives the market plenty of room for imagination.
1) Search field: Combining multiple modes of image input will better serve the current model assisted by traditional search engines+big language models.
2) Intelligent customer service: The combined image and text input model is more in line with some of the pain points currently encountered by ToC smart customer service.
3) The fine-tuning model for small to medium model companies is applied to specific segments: GPT-4 is described as a general big language model. Judging from the OpenAI documentation, they have no interest in fine-tuning specific segments for better results, so in the future, this will naturally be handed over to small to medium artificial intelligence vendors: fine-tuning industry segments based on GPT-4 to obtain better results.
The development of AI core technology falls short of anticipated risks; the risk of continued policy regulation in the technology sector; the risk of global macroeconomic recovery falling short of expected risks; macroeconomic fluctuations caused IT spending by European and American enterprises to fall short of expected risks; the development of the global cloud computing market fell short of expected risks; enterprise data breaches and information security risks; risks of industry competition continuing to increase risks, etc.
The GPT-4 uses the same technical path as GPT-3.5/ChatGPT, but brings better creativity, collaboration, reasoning ability, multi-modal ability, etc. The good results of GPT-4 are expected to drive the technology stack in the field of AI to continue to converge towards LLM models, continuously accelerate the development of the AI industry through a combination of violent aesthetics and engineering techniques, and help humans keep getting closer to general artificial intelligence AGI. We continue to be optimistic about industry investment opportunities in the OpenAI and AI fields, and suggest continuing to focus on core aspects such as chips, computing power facilities, model architecture & engineering practices, and application scenarios.