来源:中信证券研究
近日openAI采用闭源模式发布多模态大语言模型GPT-4 ,该模型采用和GPT-3.5/ChatGPT相同的技术路线,但带来了更好的创造性、协作性、推理能力、安全性等,同时训练过程中采用定制超级计算机进行算力承载,并完善了大语言模型scaling law基础理论,实现训练资源可控。
我们判断,GPT-4的良好效果,有望推动全球AI领域技术栈持续向LLM(大语言模型)模型收敛,并通过暴力美学+工程技巧的结合不断加速AI产业发展,带来更多应用场景落地的同时,亦帮助人类不断逼近通用人工智能AGI。
我们持续看好openAI及全球AI领域的产业投资机会,并建议持续聚焦芯片、算力设施、模型架构&工程实践、应用场景等核心环节。
报告缘起:OpenAI正式发布GPT-4
北京时间2023年3月14日晚,OpenAI发布了正式版本的GPT-4,取代了此前ChatGPT使用的GPT-3.5版本,并开始为付费的Plus用户提供服务。OpenAI在官网表示,GPT-4虽然在大多数现实场景中的能力不如人类,但在一些专业问题和学术基准上表现已经和人类持平。本篇报告将基于对GPT-4底层技术逻辑、实现功能详细分析的基础上,探讨GPT-4对全球AI产业带来的可能技术路径影响,以及产业层面的变化和机遇。
GPT-4:完全闭源模式发布,耗时6个月迭代调整完成的多模态大语言模型
本次OpenAI推出GPT-4的形式与以往模型的发布都有所不同,OpenAI既没有公开发布GPT-4的相关论文,也没有提供详细的框架说明,仅仅提供了一份98页的技术文档(主要描述模型能力以及相关评测的得分,几乎没有任何技术细节)。通过这一方式,OpenAI阻断了所有借鉴者参考的直接途径(模型大小、数据集构建、训练方法等),将闭源的路线坚持到底,这也符合我们之前对行业未来发展的判断:领先的头部公司(OpenAI、Google)将坚持闭源路线,避免其他公司复现其模型;落后一到两个身位的公司(Meta、Amazon、NVIDIA等)可能会选择开源路线,寄希望通过社区的力量加速迭代。
根据OpenAI发布的这份技术报告,GPT-4的训练以及迭代的耗时长达6个月以上,是之前发布的ChatGPT的两倍以上,在技术路径上沿用了自回归的Transformer模型加上人类反馈强化学习。模型能力方面最大的提升在于引入了多模态的处理能力,除了此前ChatGPT就支持的文字外,GPT-4还可以接受图片输入,不过暂时还没有开放给用户使用。此外,模型在面对复杂任务的可靠性与输出的安全性上均有显著的提升。
模型训练关键点:采用定制化超级计算机,完善了大语言模型scaling law基础理论
尽管OpenAI没有公布模型的及具体训练细节,但从其在技术文档中的表述出发,我们找到了两点可能会影响整体产业的关键点:
1)OpenAI在去年开始就与微软合作重新构建一台用于大语言模型训练的超级计算机,而本次GPT-4的训练以及迭代环节应该是完全通过这台计算机所完成的。根据Bloomberg的相关报道,OpenAI与微软所搭建的这台计算机总共耗资数亿美元,使用了近万张英伟达A100显卡,这也与我们之前报告中对训练大语言模型所需要的显卡体量计算相一致。从OpenAI在技术文档中描述的结果看(训练+迭代总共耗时6个月),GPT-4的训练过程远短于此前的预期(从之前的论文出发,如此体量的模型迭代部分的对其就要耗费数月时间),这也表明了构建专用超级计算机的必要性。我们认为,在未来数月时间我们将会看到更多AI大厂效仿OpenAI的做法,将定制化的超级计算机提上日程。
2)Scaling Law是OpenAI团队在2020年发表的论文,主要对模型能力与模型大小、训练时长间的关系做了推算,也成为了大语言模型研究的重要理论。而在本次OpenAI的技术报告中,我们看到对scaling law有了进一步的完善。OpenAI表示在开发GPT-4的过程中进一步完善了Scaling Law,对此前无法解释的涌现能力(当模型体积大小突破到某一阶段时会突然出现某种新能力)可以更好的预测。Scaling Law的完善也意味着在模型训练资源的投入将会更加可控,AI厂商将不再需要为了涌现能力一味扩大参数,这将进一步降低AI训练阶段的成本。
应用场景:多模态能力加速多领域创新
GPT-4相较于ChatGPT最直观的改变在于加入了支持图片输入的多模态的能力。尽管OpenAI表示目前多模态能力的重点还在于图片转文字,对于音频、视频、图片编辑等还不支持,但这也给了市场足够的想象空间。
1) 搜索领域:结合图片输入的多模态将更好的为目前传统搜索引擎+大语言模型辅助的模式进行服务。
2)智能客服:图片与文字结合输入的模式更加贴合目前ToC智能客服所遇到的一些痛点。
3)中小模型公司微调模型应用到具体细分领域:GPT-4被描述为一个通用的大语言模型,从OpenAI的文档看他们也没有兴趣对具体细分领域进行微调以求更好效果,那么这个在未来就自然会交到中小人工智能厂商手中:在GPT-4的基础上针对细分行业进行微调来获得更好效果。
风险因素:
AI核心技术发展不及预期风险;科技领域政策监管持续收紧风险;全球宏观经济复苏不及预期风险;宏观经济波动导致欧美企业IT支出不及预期风险;全球云计算市场发展不及预期风险;企业数据泄露、信息安全风险;行业竞争持续加剧风险等。
投资策略:
GPT-4 采用与 GPT-3.5/ChatGPT 相同的技术路线,但带来了更好的创造性、协作性、推理能力,以及多模态能力等。GPT-4的良好效果,有望推动AI领域技术栈持续向LLM模型收敛,并通过暴力美学+工程技巧的结合不断加速AI产业发展,帮助人类不断逼近通用人工智能AGI。我们持续看好openAI及AI领域的产业投资机会,并建议持续聚焦芯片、算力设施、模型架构&工程实践、应用场景等核心环节。
编辑/Somer
Source: CITIC Securities Research
Recently, OpenAI released the multi-modal big language model GPT-4 using a closed-source model. This model uses the same technical route as GPT-3.5/ChatGPT, but brings better creativity, collaboration, reasoning ability, security, etc. At the same time, a customized supercomputer was used to carry computing power during the training process, and the basic theory of large-scale language model scaling law was improved to achieve controllable training resources.
We judge that the good results of GPT-4 are expected to drive the global AI technology stack to continue to converge towards the LLM (Big Language Model) model, and continuously accelerate the development of the AI industry through the combination of violent aesthetics+engineering techniques, bringing more application scenarios to fruition, while also helping humans keep getting closer to general artificial intelligence AGI.
We continue to be optimistic about industry investment opportunities in the field of OpenAI and global AI, and suggest continuing to focus on core aspects such as chips, computing power facilities, model architecture & engineering practices, and application scenarios.
The origin of the report: OpenAI officially released GPT-4
On the evening of March 14, 2023, Beijing time, OpenAI released the official version of GPT-4, which replaced the GPT-3.5 version previously used by ChatGPT, and began providing services to paid Plus users. OpenAI said on its official website that although GPT-4 is not as capable as humans in most real-world scenarios, its performance on some professional issues and academic benchmarks is already on par with humans. Based on a detailed analysis of GPT-4's underlying technical logic and implementation functions, this report will explore the possible technological path impact of GPT-4 on the global AI industry, as well as changes and opportunities at the industry level.
GPT-4: A fully closed source model released, a multi-modal big language model that took 6 months to iteratively adjust
The format in which OpenAI launched GPT-4 this time is different from previous model releases. OpenAI neither publicly published a paper on GPT-4, nor did it provide detailed framework instructions; it only provided a 98-page technical document (which mainly describes the model's capabilities and related evaluation scores, with almost no technical details). Through this method, OpenAI blocks all direct paths referred to by learners (model size, data set construction, training methods, etc.) and sticks to the closed-source path to the end. This is also in line with our previous judgment on the future development of the industry: leading leading companies (OpenAI, Google) will stick to the closed source route to prevent other companies from reproducing their models; companies that are one or two positions behind (Meta, Amazon, NVIDIA, etc.) may choose the open source route, hoping to accelerate iteration through the power of the community.
According to this technical report published by OpenAI, GPT-4 training and iteration took more than 6 months, more than double the previously published ChatGPT. The technical path followed the autoregressive Transformer model plus human feedback to reinforce learning. The biggest improvement in model capabilities is the introduction of multi-modal processing capabilities. In addition to the text previously supported by ChatGPT, GPT-4 can also accept image input, but it is not yet open to users for use. Furthermore, the reliability of the model and the safety of the output in the face of complex tasks have been significantly improved.
Key points of model training: Using customized supercomputers to improve the basic theory of big language model scaling laws
Although OpenAI did not publish the model and specific training details, starting from its presentation in the technical documentation, we found two key points that may affect the entire industry:
1) OpenAI began collaborating with Microsoft to rebuild a supercomputer for big language model training last year, and this GPT-4 training and iteration session should have been completed entirely through this computer. According to Bloomberg's related report, the computer built by OpenAI and Microsoft cost hundreds of millions of dollars in total and used nearly 10,000 Nvidia A100 video cards. This is also consistent with our previous report on calculating the size of the graphics cards needed to train big language models. Judging from the results described by OpenAI in the technical documentation (training+iteration took a total of 6 months), the GPT-4 training process was far shorter than previously anticipated (starting from the previous paper, it took several months for the iteration part of the model of this size), which also shows the need to build a dedicated supercomputer. We believe that in the coming months, we'll see more AI giants emulate OpenAI's approach and put customized supercomputers on their agenda.
2) Scaling Law is a paper published by the OpenAI team in 2020. It mainly estimates the relationship between model ability, model size, and training time, and has also become an important theory in big language model research. And in this OpenAI technical report, we see that scaling laws have been further improved. OpenAI said that the Scaling Law was further refined during the development of GPT-4, and that it is possible to better predict the emergence of previously unexplained capabilities (when the model size of the model breaks through a certain stage, some kind of new ability will suddenly appear). The improvement of the Scaling Law also means that the investment of resources in model training will be more controllable, and AI vendors will no longer need to simply expand parameters for emerging capabilities, which will further reduce the cost of the AI training phase.
Application scenario: Multi-modal capability accelerates innovation in multiple fields
The most intuitive change in GPT-4 compared to ChatGPT is the addition of the ability to support multiple modes of image input. Although OpenAI says that currently the focus of multi-modal capabilities is still image to text, and there is no support for audio, video, image editing, etc., this also gives the market plenty of room for imagination.
1) Search field: Combining multiple modes of image input will better serve the current model assisted by traditional search engines+big language models.
2) Intelligent customer service: The combined image and text input model is more in line with some of the pain points currently encountered by ToC smart customer service.
3) The fine-tuning model for small to medium model companies is applied to specific segments: GPT-4 is described as a general big language model. Judging from the OpenAI documentation, they have no interest in fine-tuning specific segments for better results, so in the future, this will naturally be handed over to small to medium artificial intelligence vendors: fine-tuning industry segments based on GPT-4 to obtain better results.
Risk Factors:
The development of AI core technology falls short of anticipated risks; the risk of continued policy regulation in the technology sector; the risk of global macroeconomic recovery falling short of expected risks; macroeconomic fluctuations caused IT spending by European and American enterprises to fall short of expected risks; the development of the global cloud computing market fell short of expected risks; enterprise data breaches and information security risks; risks of industry competition continuing to increase risks, etc.
Investment Strategy:
The GPT-4 uses the same technical path as GPT-3.5/ChatGPT, but brings better creativity, collaboration, reasoning ability, multi-modal ability, etc. The good results of GPT-4 are expected to drive the technology stack in the field of AI to continue to converge towards LLM models, continuously accelerate the development of the AI industry through a combination of violent aesthetics and engineering techniques, and help humans keep getting closer to general artificial intelligence AGI. We continue to be optimistic about industry investment opportunities in the OpenAI and AI fields, and suggest continuing to focus on core aspects such as chips, computing power facilities, model architecture & engineering practices, and application scenarios.
Editor/Somer