With the rise of ChatGPT, can generative AI reshape the underlying logic of tool software?

中金點睛 · Mar 3, 2023 15:55

作者：于钟海、王之昊、魏鹳霏、韩蕊、胡安琪、谭哲贤

来源：中金点睛

ChatGPT向AGI更近一步，通用人工智能赋能应用软件成为可能，其中与工具软件结合想象空间宽广。对应用软件厂商而言，接入AI大模型短期成本低，长期想象空间大，因此我们观察到绝大多数应用软件厂商均积极接入大模型能力。目前，以ChatGPT为代表的AI大模型能力主要在于人机交互对话AI以及创成式AI，其与应用软件的结合主要涵盖AI+工具类软件、AI+搜索引擎、AI+服务类应用、AI+垂直行业应用等方向。我们认为其中创成式AI与工具类软件具有天然契合性，下游应用场景、想象空间广阔。

短期维度：融合创成式AI提升生产效率，成为工具软件竞争的新焦点。目前创成式AI主要以嵌入现有工具软件的方式帮助用户提升生产效率，在文字（如Notion AI）、图片（如Stable Diffusion、Midjourney）、视频（如Make-A-Video）、3D模型创作、音频等领域已有众多厂商参与探索实践。我们认为，产品价值角度，AI融合应用的功能或将成为工具软件的增量付费点；竞争角度，原生于AI的新兴厂商、传统厂商对AI应用融合的跟进速度都将使现有格局产生变化。但随着创成式AI应用的普及，未来AI融合工具软件可能成为“标配”，届时AI融合场景的应用深度将成为竞争新焦点。

长期维度：创成式AI或将重塑商业逻辑，实现生产工具向生产力的跃迁。理想情况下，我们认为，未来真正的AGI将能够不依赖于人类用户的命令与引导进行创作，AI赋能下的工具软件有可能完成从生产工具提供方向生产力提供方的转变，届时底层AI能力提供方与工具软件厂商将共同参与生产价值的分配。为了更好地理解长期视角下AI对于商业逻辑的重塑，我们将AGI从产业结构、商业逻辑、竞争格局和价值分享角度与云计算进行对比分析，我们认为正如目前“上云”已成为应用软件的“必修课”，未来“AI+”也可能成为应用软件标配，并带来新一轮价值释放。
风险

技术进展不及预期、商业化落地节奏不及预期、行业竞争加剧。

正文

AGI大模型渐入佳境，创成式AI深度赋能工具软件

ChatGPT向AGI更近一步，通用人工智能赋能应用软件成为可能

ChatGPT掀起全球AI热潮，通往AGI的道路亦或将近。ChatGPT（Chat Generative Pre-Trained Transformer）是由OpenAI开发的人工智能聊天机器人程序，其基于GPT-3.5大模型，能够完成相对复杂的语言处理任务，包括人机对话、自动文本生成、自动摘要、编写代码等，在2022年11月推出，上线两个月后用户数量即达到1亿规模，在全球范围内掀起又一轮AI热潮。ChatGPT的火爆让业界意识到AI行业在通往AGI（通用人工智能）的路途上更近一步，进而也引发了世界范围内对AGI未来会如何重塑各行各业的讨论与畅想。

全球范围内各类应用软件厂商积极拥抱以OpenAI为代表的人工智能新生态。在ChatGPT推出之后，微软计划对OpenAI追加100亿美元投资并在旗下搜索、办公软件中探索融合应用场景。由于ChatGPT显现出的巨大应用潜力与可能性，全球广大应用厂商也均开始积极尝试接入OpenAI的技术接口，以期AI与其现有产品能够产生新的化学反应。国内市场亦快速跟进，百度宣布旗下对标产品文心一言将于3月完成内测、面向公众开放，目前国内已有上百家企业宣布接入文心一言，其中不乏汉得信息、金蝶、宇信等企业服务软件厂商。同时，我们预计国内外将持续出现更多的大模型，吸引更多的应用软件厂商丰富、壮大AI生态。

对于应用软件厂商而言，接入AI大模型短期成本投入较低，长期想象空间较大。由于目前ChatGPT等大模型均在发布初期，商业模式探索亦刚刚起步，现阶段重点在于生态构建而非商业变现，因此无论是OpenAI、百度还是其他大模型厂商，短期对于应用软件厂商的接口调用均保持开放态度。这意味着，对于应用软件厂商而言，其短时间内接入AI的成本不高，而AI对其产品形态以及商业逻辑长期能够带来的积极变化却颇具想象力，因此我们观察到绝大多数的应用软件厂商均积极接入大模型能力，相关应用软件数量正呈现指数级扩张的态势。

目前以ChatGPT为代表的AI大模型能力主要在于人机交互AI能力以及创成式AI能力等，而其与应用软件的结合主要涵盖以下几个方向：

► AI+工具类软件：辅助文字、图片、视频生产。AI与创作工具类软件的融合应用主要发挥的是ChatGPT等LLM大模型的创成式AI能力，其能够基于用户输入的指令和引导完成文字生成、图片生成、视频生成等辅助创作类任务。目前典型的应用代表包括文字类的Notion AI、Office（微软计划将ChatGPT接入）；图片类的Stable Diffusion（Stability AI旗下）、Midjourney、DALL-E（OpenAI旗下）、Imagen（谷歌旗下）和视频类的Designs.ai、Make-A-Video（Meta旗下）、Lumen5等。

► AI+搜索引擎：借助自然语言处理将传统的搜索点击转变为互动问答形式，并生成个性化结果。AI与搜索引擎的结合发挥的主要是基于自然语言处理的人机交互对话能力，以问答的形式帮助用户解决问题，即时生成个性化的规划、建议、分析等。典型代表包括接入ChatGPT能力后增加互动聊天和辅助写作功能的微软NewBing搜索引擎。

► AI+服务类应用：发挥人机交互能力改善自助式服务体验。AI与服务类应用的结合主要以自助问答聊天机器人的形式落地，发挥LLM大模型的人机交互能力。典型代表包括电商、游戏、地图等各类服务应用场景中的智能问答、帮助机器人。

► AI+垂直行业应用：与现有垂类行业应用结合，本质上亦归属于上述三种形态。典型代表包括宇信、汉得、金蝶、恒生电子等，融合大模型的人机交互、创成能力，实现更高效的信息获取、分析、形成智能解决方案等。我们认为，其本质上亦可以归类为上述三种形态中的某一种，未来需要进一步探索如何与垂直场景进行更好的结合，针对行业性语料进行更深度的训练以发挥更好的效果。

图表：AI大模型与应用软件的主要融合方向

应用软件厂商在AI领域的投入将更专注于AI应用场景的探索以及与现有应用的融合。从整个AI产业链的分工来看，我们认为未来大模型厂商将会承担绝大部分的底层算法开发优化工作，而应用软件厂商则会更专注在应用场景的发掘、深耕，以及与现有AI大模型更深度的融合应用。对于通用人工智能未来能否实现产业化，我们认为先进的底层大模型以及与之相匹配的上层应用均缺一不可，AI厂商与应用软件厂商未来将会有更加紧密合理的分工合作。

在上述的应用方向与场景中，我们更为关注创成式AI与工具软件结合的可能性。以ChatGPT为代表的大模型应用中，创成式AI是较为突出的能力，且与现有的工具类软件（文字创作工具、图片创作工具、3D模型创作工具等）具有天然的契合性，下游应用场景广阔、颇具想象空间。因此在本篇报告中，我们将主要聚焦于创成式AI对工具软件的赋能，以及其长期重塑工具软件底层商业逻辑和产业生态的可能性。

创成式AI赋能工具软件将为其创造哪些可能性？从短期维度来看，目前创成式AI主要以嵌入到现有工具软件中的方式，作为创新性的辅助功能来帮助用户提升生产效率，厂商可以将其作为增值服务来额外收费；但从长期来看，如果未来创成式AI能够实现不依赖于用户引导的主动式创作，则其有望实现从生产工具向生产力的蜕变，真正意义上替代部分“创作人员”的工作。因此我们对于创成式AI的态度是，短期保守，长期不低估。
图表：全球范围目前已经或计划接入OpenAI、文心一言等语言类大模型的应用一览

短期维度：融合创成式AI提升生产效率，成为工具软件竞争的新焦点

目前创成式AI主要以嵌入现有工具软件的方式帮助用户提升生产效率。工具软件融合创成式AI后，能够在用户指定的框架、指令与引导下进行辅助式创作，进而帮助用户减少重复性、机械性、规则导向的劳作，甚至进而承担具有一定创造性的工作，比如基于指引在现有语料库基础之上收集归纳形成文字创作、基于文字描述生成图片与视频、在3D模型创作中辅助实现参数优化等。目前在文字、2D图片、3D模型、音频、视频等多个模态领域已有众多厂商选择参与探索实践：

► 创成式AI与文字创作：海外厂商如Notion内置了AI写作助手可以根据用户描述自动生成不同应用场景下的文本内容，微软亦计划在Office中接入ChatGPT能力；国内厂商如金山办公旗下WPS可以实现文档校对、全文翻译和辅助写作等功能。除了C端应用之外，亦有厂商开发了专门面向企业的AI辅助文字创作产品，典型代表为第四范式旗下的式说，能够将GPT这类大型生成式语言模型与企业内部垂直领域知识融合、同时保障私有化部署，以满足企业级应用对垂直行业知识、数据安全、内容可信等的要求。

► 创成式AI与图片创作：海外已有较多公司推出基于文字生成图像的产品，流行度较高的包括OpenAI旗下DALL E 2、Stability AI旗下Stable Diffusion、Midjourney等，其操作流程大多类似，输入关键词即可生成多幅AI绘画内容，并支持进一步修改和添加细节，各厂商在生成图片风格上各异，DALL-E2偏写实、Midjourney偏科幻、Stable Diffusion无风格偏向可根据细节指令多次尝试调整。国内亦有厂商跟进如昆仑万维旗下的天工巧绘和万兴科技旗下的万兴爱画等。

► 创成式AI与音频创作：海外厂商如谷歌在去年10月发布了AudioLM，可以根据输入的音频片段生成相似风格的音频，今年1月又推出MusicLM，可直接根据文字、图像生成音乐；微软也于今年1月发布了VALL-E，只需3秒音频即可模仿人说话，且可以复制说话者的情绪和语气；此外还有Stability AI旗下的Dance Diffusion、Open AI旗下Jukebox。国内亦有厂商跟进，包括讯飞配音、百度语音合成和腾讯智影等。

► 创成式AI与视频创作：海外厂商如Meta旗下的Make-A-Video支持根据文字描述生成视频；谷歌旗下Imagen Video和Phenaki分别支持不同画质和长短要求的视频创作，2月初其再次发布视频编辑新方法Dreamix，能够实现对已有视频的编辑和通过提供图片与描述生成视频；此外还有Runway推出的AI视频生成模型GEN-1。国内厂商亦有所尝试，如万兴旗下的万兴播爆支持根据关键词生成数字人宣传视频、百度孵化的VidPress支持导入图文后自动实现配音、字幕、画面的视频内容生产，此外还有当虹科技的画质增量类AI产品和商汤智影推出的辅助智能脚本创作产品。

► 创成式AI与3D模型创作：Creo、Autodesk Fusion360、Solid Edge、Solidworks等3D CAD产品目前已广泛集成AI Inside应用能力，主要用于辅助实现参数优化和草图生成；在EDA领域，Synopsys、Cadence等海外EDA厂商在AI赋能芯片设计上均有所探索，通过已有的设计数据训练模型，实现更高的设计效率。

工具软件与AI的融合能够优化用户体验与生产效率，提升产品竞争力。无论从带给用户“新奇感”还是从提升用户生产效率的角度，接入AI对于工具软件而言都是提升产品吸引力与竞争力的较优选择。同时由于短期接入大模型的试错成本较低，我们判断广大的工具软件厂商对于相关能力的接入都会抱有开放的态度，产业生态有望快速壮大。

但客观来看，目前的创成式AI仍具有多方面不足，主要作为辅助生产工具的形式出现。由于目前以ChatGPT为代表的创成式AI仍存在缺乏特定行业语料训练、语料库滞后、无法保证逻辑推理正确性等多种不足，因此短期仅作为辅助生产工具的形式出现，并不具备完全主动进行生产创作的能力。并且在使用的过程中，用户也需要额外注意可能存在的版权纠纷、敏感信息、偏见歧视等方面的问题。我们认为，创成式AI与应用软件融合仍处于起步阶段、提升空间广阔。

AI赋能工具软件后，短期会对行业生态及商业格局带来哪些影响？

产品价值角度，AI融合应用的功能或将成为工具软件的增量付费点。短期来看，工具软件厂商能够将与AI的融合应用作为差异化功能点和增值服务，向用户进行增量收费，进而打开产品付费天花板。比如微软发布的Teams Premium，需以每月10美元的价格订阅享用基于GPT-3.5的自动生成会议笔记等功能；同为微软旗下的辅助代码生成和修改应用Copilot亦需额外付费；Notion目前AI增强功能alpha测试版本免费，但官方表示未来正式版本将大概率收费。图表：AI增强功能或将成为工具软件的增量付费点，进一步打开产品收入天花板

竞争角度，原生于AI的新兴厂商、传统厂商对AI应用融合的跟进速度都将使现有格局产生变化。我们将AGI看作一个新的技术革命，可能对传统产业格局带来冲击。类比云计算时代，Salesforce等新兴SaaS厂商把握“上云”新趋势，异军突起“超车”SAP等老牌软件厂商；而Oracle、微软等传统厂商的云转型成效也直接影响其市场影响力演变趋势。事实上，目前已有一批AIGC相关独角兽正快速发展，在未来的AI融合应用时代，我们认为原生于AI的新兴厂商的出现及传统厂商AI转型效果都可能给现有的竞争格局带来改变。

图表：AIGC相关独角兽正快速发展，或使现有格局发生变化

但随着创成式AI应用的普及，未来AI融合工具软件可能成为“标配”。由于工具软件厂商不需要在AI大模型开发上投入成本，而只需专注于AI融合应用的实践与适配，前期成本并不高，因此我们判断如果早期参与的工具软件厂商通过融合AI实现了商业成功，产业中的其他参与者将会快速跟进，AI融合工具软件或将成为“标配”。在这种情况下，我们认为，工具软件厂商可能将无法继续对AI增强功能进行单独收费，而厂商之间竞争的差异点也会从“有无AI增强”变为“能否用好AI”。

未来AI融合场景的应用深度将成为工具软件厂商竞争的新焦点。当AI融合应用成为工具软件厂商的“标配”后，厂商之间竞争的焦点将落在如何发掘更适合AI的应用场景、最大限度发挥创成式AI的效能上。在同样都能够接入AI通用大模型能力的前提下，我们认为未来能够将AI与现有应用场景更好融合、更大程度发挥AI价值的厂商有望在新一轮的竞争中胜出，一些领域现有固化的竞争格局也可能会受到冲击甚至的颠覆。

长期维度：创成式AI或将重塑商业逻辑，实现生产工具向生产力的跃迁

理想中的AGI能够将生产工具升级为生产力，重塑工具软件底层商业逻辑。长期来看，AGI（通用人工智能）融合工具软件应用具有较大的想象空间，产业中不乏将通用人工智能比作新一次“工业革命”以及“科技奇点”的观点。理想情况下，我们认为，未来真正的AGI将能够不依赖于人类用户的命令与引导进行创作，彼时，融合了AGI自主创作能力之后的工具软件将不再仅仅是辅助人类用户提效的“生产工具”，而成为独立的增量“生产力”。

AI赋能下的工具软件成为生产力后应直接参与生产价值的分配，生产价值由底层AI能力提供方与工具软件厂商共享。我们认为，未来，如果AI赋能下的工具软件能够完成生产工具提供方向生产力提供方的转变，其商业逻辑将不再是间接收取提供工具的费用，而应直接参与生产价值的分配，比如一本完全由AI赋能的文字创作软件撰写的书籍，底层通用AI能力提供方与文字创作工具软件提供方均有权从书籍销售额中获得分成。

图表：创成式AI将生产工具升级为生产力，带来商业逻辑质变

短期看，拥有稀缺AI融合场景的下游厂商更为关键；长期看，议价权向掌握底层通用AI能力的平台厂商转移。在AGI探索的早期阶段，适合的下游应用场景较为稀缺，底层通用AI平台厂商希望尽可能多的应用厂商接入，进而获得更丰富的在垂直应用场景训练大模型的机会。但长期来看，由于训练大模型的技术、成本要求较高，随着AGI应用逐步深入，我们认为，最终议价权可能会向少数拥有底层通用AI能力的平台型厂商转移，其有望在价值分配中获得更高的比例。但暂不论最终价值分配比例孰高孰低，我们认为，在这一过程中，工具软件厂商的商业逻辑都产生了质变——即有可能直接介入到生产价值的分享过程中。

图表：理想中AGI带来工具软件价值分配逻辑变化

如何更好地理解长期视角下AI对于商业逻辑的重塑？我们将其与云计算带来的SaaS模式对比。我们认为，AI和云计算同为具有划时代性质的技术变革，云计算创造了SaaS这一新型的商业模式并改变了传统企业服务软件的竞争格局，因此，我们将AGI从产业结构及商业逻辑等方面与云计算进行对比分析，讨论其可能带来的商业影响。

► 产业结构角度，AI中的算力、模型、AI融合应用分别对应云计算中的IaaS、PaaS、SaaS。我们认为，与云计算的三层产业结构类似，AI模型的训练需要底层强大的硬件支持，算力层即对应云计算中的IaaS层；AI大模型则与基础软件类似，承担通用需求，同时目前大模型接口也正在尝试按量付费模式，MaaS（Model-as-a-Service）即对应云计算中的PaaS层；最上层应用软件调用AI大模型，直接面向企业、消费者提供融合AI能力后的垂直场景功能，即对应基于底层云计算基础设施和平台能力提供服务的SaaS软件。

图表：AI中的算力、模型、AI融合应用可以分别对应云计算中的IaaS、PaaS、SaaS

► 商业逻辑角度，云计算从销售产品向订阅服务转变，AGI有望带来生产工具使用付费向生产力直接参与价值分配的改变。云计算使得客户从一次性买断基础软硬件产品向持续性付费以享受云厂商提供的服务转变，订阅制对于供应商来说意味着更优的现金流和收入可持续性、以及更高的客户付费总量。正如我们前文的讨论，若未来AI赋能下的工具软件能够完成生产工具提供方向生产力提供方的转变，其商业逻辑将从收取工具使用费用，转向直接参与生产价值的分配，对于供应商来说也意味着更优的收入可持续性和更高的收入天花板。

► 竞争格局角度，新厂商的进入和传统厂商对于新技术的适应程度均使得现有格局产生变化。以数据库基础软件市场格局为例，过去十年的市场格局变化主要受云厂商和云原生独立数据库厂商进入以及传统数据库企业云转型成效优劣的影响。类比来看，我们认为，未来原生于AI的新工具类软件厂商进入，以及现有厂商融合AI的速度和能力优劣也可能重塑市场竞争格局。

► 价值分享角度，底层基础设施厂商提供通用能力，上层应用厂商聚焦垂直场景。云计算产业链中，IaaS、PaaS层厂商提供通用软硬件基础设施能力，SaaS层厂商聚焦于提供垂直功能应用。类比来看，AI底层平台型厂商提供通用大模型能力，上游工具类软件厂商寻找适合AI赋能、变现的落地场景。而在AI所需的算力成本方面，我们认为AI厂商将会承担训练成本，而后续的推理成本则会由AI厂商与应用软件厂商共同承担（类似于云计算的租用云计算资源，未来的AI产业会是租用模型和算力）。

图表：长期来看，AI有望与云计算一样带来工具软件的商业逻辑重塑

“上云”已成为应用软件的“必修课”，我们认为未来“AI+”也可能成为应用软件标配。目前支持云部署已经基本成为软件厂商的必备能力项，在2010年以后成立的多数软件公司均选择了云原生的技术路线；而传统软件企业亦积极转向云端，并在商业模式上也向订阅制转型。而从应用软件对AI的融合应用来看，同理我们认为也“AI+”有望成为新一代应用软件的标配，而应用软件厂商也将在与AI厂商的探索与磨合中形成新的一套成熟的商业模式。

商业模式重塑之后，云计算促进应用软件的价值重估，未来AGI同样可能带来新一轮价值释放。云计算通过软件开发、部署、交付、收费方式变化，催生商业模式与业务逻辑升级，进而引发资本市场对于工具软件乃至整个应用软件行业的价值重估。我们认为，长期来看，未来创成式AI赋能工具软件可能带来新一轮的价值释放。但短期来看，由于目前大模型仍存在诸多缺陷，下游应用及增量付费场景仍在探索中，版权、法规上亦有进一步讨论明确的必要，因此我们的上述猜想在未来演进方向上仍存在较多不确定性，需要持续跟踪、观察。

总结来看，AI融合工具软件想象空间宽广，但实际落地仍有诸多挑战，我们强调短期不夸大、长期不低估的观点。AI融合工具软件想象空间宽广，但最终落地实现仍需依赖底层算力与大模型算法演进迭代，同时尚有法律、伦理相关问题有待讨论、解决。我们认为，AGI应用的前途是光明的，但道路是曲折的，我们强调短期不夸大、长期不低估的观点，建议投资者持续关注跟踪最新产业趋势，并对AI融合工具软件可能的各大应用场景保持关注。

图表：AIGC关键技术持续突破，AI融合工具软件想象空间宽广，我们强调短期不夸大、长期不低估的观点

资料来源：OpenAI官网，《Denoising Diffusion Probabilistic M

创成式AI赋能工具软件的产业实践与应用趋势

创成式AI与文字创作：ChatGPT有望加速AI文字创作落地

创成式AI能够在文字创作场景下完成写作、改写、修正、翻译等功能。AI可以借助互联网广泛的文本数据对文字创作工具进行训练，目前Transformer大模型在自然语言场景下的应用能力已经相对成熟，我们认为文字创作有望成为创成式AI快速落地的应用场景。我们观察到Notion、微软等已经开始将AI语言模型接入笔记和办公软件；第四范式也推出了面向企业客户的AIGC工具，办公软件龙头金山办公在中长期来看也有望实现AI赋能，提升文字创作效率。我们认为创成式AI在文字创作场景下主要能够实现四大能力：

► 写作：基于海量的语料库，Transformer神经网络拥有语言理解和文本生成能力，因此可以根据使用者的简单指令生成逻辑连贯、事实丰富的语段；

► 改写：与普通规模的语言模型相比，大型语言模型拥有一定推理能力，能够形成思维链来解决抽象问题，因此可以根据用户要求完成文本改写任务；

► 修正：通过在海量文本数据中对比学习和总结规律，创成式AI可以纠正所给文本的拼写、语法、标点等错误，使修改后的文本更加符合常用语言范式；

► 翻译：创成式AI可以利用循环神经网络和卷积神经网络拆解结构复杂的语段并联系上下文进行翻译，从而大幅提升翻译的整体性、准确性和可读性。图表：创成式AI在文字创作场景中的四大能力

案例1：Notion AI优化文字创作

Notion AI能基于简单指令生成丰富的文字内容。Notion AI是用于Notion产品的人工智能工具，通过集成机器学习和NLP技术，帮助用户提高文字创作的效率和体验。在AI大规模语言模型赋能下，用户只需要罗列出基本需求，产品即可自动生成丰富的文字内容，文字内容的类型覆盖会议议程、销售邮件、新闻发布稿等多种场景。Notion AI还拥有总结、改错、翻译、续写、头脑风暴等功能；后续Notion AI还将会成为Notion知识库的接口，用户只需要输入搜索要求，Notion AI即会自动呈现相关信息。我们预期Notion AI的自动文本生成、文本摘要、文本编辑等功能或将大大优化用户的创作流程和使用体验，帮助Notion的产品力实现跃升。案例2：微软AI与Office的融合计划

AI赋能下微软Office料将优化产品体验。微软2019年以10亿美元投资OpenAI并与之建立了较为深入的合作关系，近期微软计划将OpenAI的下一代语言模型整合进Office办公软件中的Word、PowerPoint、Outlook等应用程序，用户只需要输入简单指令，即可获得自动产生的文字内容。新版Office将拥有自动总结、内容建议以及文本生成功能，可提供类似Bing-ChatGPT侧边栏的体验，用户可在侧边栏中与聊天机器人交互。

庞大用户规模和训练数据有望助力Office AI应用能力快速迭代。Office办公软件用户规模优势明显（21年PC版全球装机量15亿套），我们认为OpenAI的人工智能技术与Office软件的融合一方面能让AI找到优质的落地场景；另一方面，Office软件庞大的用户规模有望为AI提供源源不断的海量训练数据，从而形成飞轮效应，不断改善AI的文字创作体验。

案例3：模力表格提供内嵌于表格场景的AI文字处理应用

模力表格通过AI大模型实现表格中文本内容的“批量化计算”。模力表格由面壁智能公司和大模型开源社区OpenBMB（主要成员来自清华大学）联合开发，其将AI大模型的文字处理能力嵌入到函数中，通过在表格中输入函数即可调用模型，目前支持的函数包括IE（信息抽取）、QA（问答）、MT（翻译）、SA（情感分析）、TG（标题生成）等，同时支持和Excel基础函数集成使用。我们认为通过表格中的AI文字处理应用能够实现文本批量化计算，大幅提升办公效率。

图表：模力表格实现表格场景下AI文字处理能力

案例4：第四范式满足企业场景AIGC需求

第四范式推出企业级类GPT产品“式说”，助力企业利用内部知识解决问题。第四范式通过将类GPT语言模型与垂直领域知识进行融合，推出“式说”产品，旨在解决大型生成式语言模型在企业内部使用场景下的局限，满足企业场景下的AIGC需求。“式说”主打三大产品特点：1）数据安全，通过私有化部署解决企业客户对数据安全的顾虑；2）内容可信，“式说”基于企业内部数据库，并且在提供回答时标注信息原始出处，增加了回答的可信性和可靠性；3）成本可控，“式说”算力成本相对可控，而且对数据标注量的需求较小。我们认为“式说”这类服务于B端客户的AIGC工具能够助力实现企业知识复用，提高企业生产和管理效率。

图表：第四范式“式说”产品工作界面

案例5：竹间智能借助AIGC赋能写作&对话&知识搜索等多场景

竹间智能推出类ChatGPT产品，赋能企业级AIGC应用。公司成立于2015年，为金融、企业、健康医疗、制造、智能终端、政务六大领域提供AI赋能解决方案。2022年9月公司推出AI SaaS产品，涵盖客户服务、销售服务、企业内部服务等多场景，为中小企业提供云端AI工具。在AIGC领域公司亦持续深耕，先前已推出Magic Writer等多款智能创作写作软件，并于近期推出企业级Gemini GPT产品系列，包括企业对话机器人KKBot、交互式认知搜索引擎ChatSearch，在销售客服、人机交互、知识探索等方面借助AI实现全面赋能。

案例6：印象笔记借助自研轻量化大模型辅助文字创作

基于自研“大象GPT”模型，推出“印象AI”创成式文字工具。2019年以来，国内笔记应用厂商印象笔记发力AI在笔记文字处理中的AI应用场景，陆续推出了智能推荐、智能标签、智能摘要、知识星图等AI工具。印象笔记同时持续投入大模型研发，于2023年推出了结合OPT、BLOOM等类GPT-3.5结构大语言模型自主研发构建的大语言模型“大象GPT”，并基于此推出“印象AI”创成式文字工具模块内嵌于自身的笔记产品中，实现了国内厂商通过自研模型实现AI文字创作的先发应用。未来印象笔记计划利用基于人类反馈的强化学习（RLHF）来优化模型，并计划与私人语料结合赋能具备个人风格文字创作。

案例7：Minimax打开C端落地新场景

区别于ChatGPT的专业知识问答，MiniMax推出的Glow主打聊天社交功能。公司成立于2021年年底，已自研文本到视觉、文本到语音、文本到文本三个模态的通用大模型。2022年11月，MiniMax推出首款AI对话机器人平台Glow，用户可选择已存在的智能体进行对话，或者通过简短描述创造智能体并在后续对话中实现优化调整，智能体的对话生成、人物头像生成、音色生成调用了MiniMax三大模态模型的能力。区别于ChatGPT聊天机器人倾向于问题搜索、文本生成等功能，由Glow生成的智能体拥有不同的背景和性格设定，与用户对话的内容也偏向于闲聊陪伴、情感互动、剧情演绎。我们认为，MiniMax的聊天机器人与用户交互效果较好、具备较强的用户粘性，打开C端落地新场景。

案例8：金山办公潜在的AI应用场景

金山办公在AI领域已有扎实布局。国内办公软件龙头金山办公在计算机视觉、自然语言处理、语音处理等AI领域也都有广泛的技术与业务布局。公司自2017年开始搭建AI中台，围绕办公领域已经开发出近100项AI能力。在自然语言处理方向，金山办公已经开发出辅助写作功能，用户只需提供一个提纲，AI即可基于语料算法自动生成文本，用户可以将AI生成的文本作为底稿，大大提升写作效率。此外，金山办公也已实现AI校对、翻译、纠错等功能，并将其作为WPS办公软件套件的重要增量功能。

我们判断金山办公会在紧跟AI产业趋势的同时，适时切入跟进。我们判断金山办公会把主要发力点瞄准AI应用端。公司现有产品WPS积累的用户量级大、用户场景多样且复杂度高，我们认为金山办公若能深挖用户场景，将可以在邮件、办公、营销、政务、文学等各个细分场景中提供相应的AI文字创作服务，提升用户使用体验，加深产品护城河。未来我们判断公司会在充分考量国内各家AI大模型厂商的能力之后，适时尝试接入应用，尽可能地发挥AI大模型在办公软件领域的应用潜能。

创成式AI与音频生成：跨模态应用进军音频行业

海外案例1：谷歌不同团队均有音频生成研究成果

谷歌在2023年发布了不同的音频生成模型，并且有各自的特点。在此之前也出现过相关AI创作音乐的尝试，如可视化音乐创作模型Riffusion、谷歌发布的AudioML和OpenAI推出的Jukebox。而现在的研究成果基于Diffusion模型、标注好的音频数据，通过提取数据特征、文本和音频的配对，实现文本生成音频。

► MusicLM：这是一种从文本描述中生成高保真音乐的模型，例如用户可以输入“平静的小提琴旋律伴随失真的吉他即兴演奏”。MusicLM将条件音乐生成过程转换为层次化的Seq-to-Seq建模任务，并能够保持24 kHz的频率生成一段几分钟的音乐，无论是文本描述还是音频质量都优于之前的模型。此外，MusicLM还能够基于文本的描述转变原来的旋律、根据图片画作和文字描述生成对应的音乐伴奏。

► Noise2Music：连续应用Diffusion模型生成24kHZ的音频片段，使用两个深度模型伪标记大型伪标记音频数据集生成训练集，大预言模型生成音乐描述性文本，嵌入预训练的音乐-文本联合模型，通过zero-shot分类为音频分配相应文本。Noise2Music可以理解更加复杂的prompt语义，生成不同风格，如“一位女低音在现场表演中演唱一首慢速爵士民谣”；或者模仿不同的乐器，如钢琴、萨克斯、非洲鼓等。

► SingSong：该模型可以根据人声自动生成伴奏，其技术基础建立在人声的音源分离和音频生成上。用户只需要输入其人声，就可以获得对应的乐器伴奏。研究人员召集了一批听众评估模型的效果，展示两个具有相同人声的10秒伴奏音频，SingSong获得的反馈明显优于其他基线模型。

海外案例2：英国学术机构提出AudioLDM，提升质量并优化算力消耗

AudioLDM模型解决了“文本到音频”的研究存在的质量有限、计算成本高的问题。英国萨里大学和帝国理工学院联合发布并开源了一个基于去噪扩散隐式模型和对比学习的框架：AudioLDM。该模型提升了文本生成音频的质量；训练过程中仅仅需要文本数据就达到了比使用音频-文本相当甚至更好的效果；此外模型训练计算资源消耗低，并且不需要额外训练就可以对声音风格进行变换或者模仿。

国内案例1：科大讯飞推出全新训练框架优化语音韵律

科大讯飞推出SMART-TTS框架并上线讯飞开放平台、讯飞有声以及学习强国。SMART-TTS不直接学习文本与音频特征的映射，而是通过模块化拆解语音合成的学习过程，预训练加强各个模块。该框架可以提供“高兴、抱歉、悲伤”等11种情感，每种情感有20档强弱度调节；也能提供声音的停顿、重音、语速等，可以在数字人语音上实现真人表达的感情。此外，科大讯飞的语音合成支持37个语种、11种方言、2种民族语言以及中英混合自然合成。

国内案例2：国产AI语音生成“独角兽”云知声

除了文本生成音乐以外，语音合成也是音频生成的重要方向。国内“独角兽”云知声提供语音合成产品服务，包括文本语音合成、音库定制和声音克隆。其中，语音合成可以将文本转换成自然流畅的语音，提供更多音色、不同情感并提供调节音量、语速、音高等功能；音库定制主要面向企业客户，提供定制化的音库服务，通过深度学习生成专属IP发音；声音克隆可以通过录制少量的用户声音，快速得到音色和发音风格与录音相似的声音模型。这些功能适用于智能客服、智能硬件、新闻播报、自媒体配音等各种有声场景。

创成式AI与图片创作：跨模态带来丰富想象空间

2022年，随着CLIP、Diffusion大模型的诞生与开源，DALL·E 2、Stable Diffusion模型落地进一步推动，文本生成图像等跨模态生成成为AIGC落地主线。OpenAI具备大模型基础、开源数据库中海量图文对应数据、头部厂商的算力支撑以及门槛降低三要素条件后，发布升级版“文生图”模型DALL·E 2，将AI作画（文本跨模态生成图像）推向落地，掀起AI作画浪潮；2022年8月，Stability AI开源Stable Diffusion模型，标志着AIGC在AI作画领域跨模态应用的门槛大幅降低，开启全民创作的“工业化生产”时代。海外应用层在此基础上催生出Midjourney、ChilloutMix、Controlnet等精调模型、插件，不断提高生成图像质量，逐步推动AI图片创作商业化。

海外案例1：“文生图”开山者DALL·E及DALL·E 2

DALL·E由OpenAI率先推出，并于2021年通过Azure OpenAI服务开始将其技术商业化，2022年4月发布升级版DALL·E 2。凭借OpenAI在2021年发布的基于GPT-3的图像文本匹配模型CLIP，DALL·E 2具备了联系文本和视觉图像的能力；又通过基于Diffusion的图像生成模型GLIDE，DALL·E 2能够按照文本生成逼真的图像，分辨率提升了4倍，准确率更高，并且业务更广，具备三种功能：1）根据文本提示生成图像，2）以给定图像生成新图像，3）以文本编辑图像元素。

DALL·E 2目前采取付费购买次数的商业模式：加入Open Beta项目后，首月50个免费点数，每一个点数对应一次绘图，之后每个月免费补充15个点数，目前的价格是15美元115个点数。相较于DALL·E，DALL·E 2不仅能够生成更真实、更准确的图像，还能够更完整地表达场景并通过自然语言描述对现有图像进行增删元素等编辑。而相较于该领域内其他模型，DALL·E 2的可控性较高，空间结构关系处理优异，高写实的图像仿真度较强。DALL·E 2的技术成熟和率先落地将AI作画从想象照进现实，2022年7月，DALL·E 2开启邀请制公测，为AIGC在2022年热度提升的重要推动力。

海外案例2：Stability AI开源Stable Diffusion，以AI作画对外输出

Stability AI成立于2020年，2022年凭借推出并开源Stable Diffusion的底层能力，投后估值超10亿美元，在种子轮融资阶段即晋升为独角兽。Stable Diffusion主要基于潜扩散模型（Latent Diffusion Model），通过迭代“去噪”输入并解码输出来生成图像，使用空间降维解决内存和模型推理时长痛点，不仅使用户仅在消费级显卡上就能够快速生成高分辨率、高清晰度图像，而且建立开源生态，大大降低用户的使用门槛。至此，开源生态推动AIGC的数据、模型与算力问题初步解决，直接降低了使用者的门槛，渗透进多个垂直领域。

海外案例3：成功变现的商业模式，AI作图现象级应用Midjourney

Midjourney基于CLIP和Diffusion构建了闭源的“文生图”模型，已实现1000万用户和超1亿美元营业收入。该产品搭载于Discord社区，用户通过将Midjourney机器人邀请至频道内，并输入以“/image”为开头的prompt生成想要的图片。Midjourney拥有超1000万名社区成员，通过用户对生成结果的选择来获取反馈，从而具备了庞大且独特的数据集，建立起竞争壁垒。Midjourney生成的图片所需prompt较短、质量高、具有科幻色彩，受设计人群、Web3 & NFT从业者以及个人用户喜爱，采用SaaS付费的商业模式，已经实现盈利。

比起海外前沿技术，国内的AI图片创作落地相对早期，但相应成果也取得了一定的进展，涌现出一批创新的产品和技术。其中以百度的文心·一格、万兴科技的万兴爱画为代表，不仅展现了国内拥有人工智能作画的能力，同时进行创新研发出“AI简笔画生图”，拓展了创作的交互方式，提高了用户使用的效率和体验。

国内案例1：百度基于文心大模型，AI作画能力对标海外

文心·一格是百度依托飞桨、文心大模型推出的首款AI作画产品。该产品支持文本生成国风、油画、水彩、水粉、动漫、写实等十余种不同风格的图像，为专业内容创作者提供创作平台的同时为入门级用户、大众用户实现想象力落地提供可能。而面对应用落地的三重挑战：创作需求理解、图像原创生成和创作需求满足，文心·一格进行了三大技术创新，分别是基于知识的prompt学习、文本跨模深度融合和文本驱动的图像编辑，实现了创意规划、细节刻画能力和多轮交互提升质量。

国内案例2：万兴科技深耕AIGC作画，OpenAI赋能国内厂商的案例标杆

万兴科技深耕海外业务20年，接入OpenAI的API，打造出面向绘图创意领域的新型创作神器：万兴爱画。万兴爱画定位于专业打造“AI生成高品质艺术品”，提供随机生成与关键词创作两种AI文生图模式，用户可以自行输入关键词、选择图片比例和艺术风格，30秒就可以获得由AI生成的绘画作品，作品支持各种艺术风格，比如手绘、赛博朋克、二次元、CG数字渲染等。而且产品支持中文和英文双语创作，通过感叹号和括号强调关键词。

2023年2月，万兴爱画在业界率先推出“AI简笔画”。该产品成为全球首款通过用户交互并以此“图生图”的AI作画软件，标志着万兴爱画助力AI绘画进入新时代。相比之前的作画方式，简笔画对用户原先的prompt要求更低，如今只需简单几笔就能在5秒内生成高质量艺术画作；用户同样可以通过图片选择反馈使模型迭代升级。通过简笔画“图生图”，用户在创作中更具参与感，过程也更有趣味性。图表：万兴“AI绘画”创作界面

创成式AI与视频创作：跨模态阶跃尚处早期，有望打开应用天花板

海外科技巨头的标杆案例打开AI视频创作的想象空间。2022年9月，Meta发布了从文本生成视频的Make-A-Video，能够基于几个词或句生成数秒的短视频。仅一周后，谷歌发布Imagen Video、Phenaki，分别定位于生成高画质、长时段视频。目前AIGC跨模态生成视频领域仍存不足，利用AI生成的视频有明显的缺点，例如物体的模糊与扭曲，也不能生成更长的场景来详细、连贯的讲述故事，但我们认为AIGC视频生成有望在技术上实现突破，打开应用天花板。

案例1：Make-A-Video实现文本与视频之间的跨模态生成

Make-A-Video能够基于文本生成视频。Make-A-Video是2022年7月Meta发布的文本生成图像模型Make-A-Scene的进一步升级。通过向Make-A-Video输入文本即可生成数秒的视频，支持不同的视频风格。除了文本生成视频，Make-A-Video还能够实现输入单个或两个图像来创建运动，即图像生成视频。

案例2：谷歌在视频的跨模态生成领域不断产出成果

谷歌在文本生成视频、图像生成视频均有涉猎。谷歌在Meta推出Make-A-Video一周后，推出了Imagen Video和Phenaki，其中Imagen Video画质较高但生成视频时长较短，Phenaki生成视频的画质较差但能生成超过2分钟的视频；2022年11月，谷歌首次发布将二者相结合的视频，兼顾品质与长度。2023年2月2日，谷歌提出视频编辑新方法Dreamix，能够实现对已有视频的编辑和通过提供图片与描述生成视频。

案例3：Runway推出的GEN-1模型在生成视频质量上更胜一筹

由GEN-1模型生成的视频风格多样化。Runway成立于2018年，是Stable Diffusion的联合发布方之一。2023年2月，Runway推出AI视频生成模型GEN-1，通过将图像或文本提示的构图和风格应用于源视频的结构上以合成新视频，在生成视频的画质和长度上再迈进一个台阶。

国内厂商：亦处于早期探索期，辅助创作效率提升

国内厂商在生成视频领域也处于早期探索期。国内厂商在视频领域应用AIGC技术更多落在视频内容创作及品质升级的层面，实现视频的属性变化与“流水线式”内容创作，目前多应用于B端、为内容创作者提供生产效率的提升。

► 文本生成视频：2022年5月，清华大学联合智源研究院发布基于Transformer架构的CogVideo模型，该模型是业内首个开源的文本生成视频AI模型，但生成视频的分辨率较低、长度也较为有限，目前只支持中文输入。

► 画质增强与修复：当虹科技在画质增强类产品已较为成熟，其中包括视频插帧、视频细节增强、提升视频画质、老旧影像的修复与上色等。

► 视频自动创作：百度孵化的智能视频创作工具VidPress支持导入图文链接后自动实现配音、字幕、画面的视频内容生产，目前已为人民日报等媒体机构、百家号和好看视频等平台的终端用户提供智能生成视频功能。

► 智能脚本创作：商汤智影推出的“视频元素分析”能够提取并分析视频中多种元素，例如人物、场景、道具、台词等信息，自动生成分镜头脚本，准确率达98%，并提取视频爆款元素，有效减少脚本撰写时间，助力广告厂商节约内容制作成本。

受限于技术成熟度，AI独立创作的视频仍无法直接实现2B端落地变现，但目前已经在辅助商业化创作的过程中发力。2023年1月31日，Netflix与小冰公司日本分部（rinna）、WIT STUDIO共同创作的首支由AIGC技术辅助的发行级别动画片《犬与少年》正式发布，该动画全长3分多钟，使用AIGC完成部分场景绘制，证明了AI技术在辅助视频创作过程中已经开始实现商业化落地，但距离真正应用到大型项目、实现商业化变现仍有距离。

此外，基于自研稀疏模型在垂直领域落地的厂商具备多模态矩阵，以出门问问为例，打造文本、图像、语音、视频、数字人等多模态AIGC产品矩阵，布局提供一站式内容生成工具。出门问问于2020年推出其第一款AIGC商业化产品——配音平台“魔音工坊”后，全面布局AI声音、AI写作、AI图片生成、声音与形象克隆、数字人视频等AIGC领域，多点开花聚焦广泛的商业场景。

创成式AI与3D模型创作：以参数化建模为基，GPT文字处理赋能

工业场景的3D建模对AI能力要求较高，现阶段创成式设计无法完全支持。区别于图片和视频的创作，3D模型主要用于生产工业场景，需要更加严谨理性的建模创作能力，而目前ChatGPT等AI工具的数学和逻辑能力有所欠缺，因此通过文字描述进行创成式AI直接建模的进展相对较慢。另一方面，大装配场景如飞机、船舶等模型的设计需要非常严谨的过程和参数，我们认为创成式AI设计在这类大型场景下的支持能力有限。目前我们观察到AI在3D CAD领域和EDA领域的主要落地仍然以“AI Inside”赋能为主。

3D CAD中的创成式设计：以参数化建模为基础的AI Inside赋能

3D CAD场景下的创成式设计（Generative design）主要借助AI的能力生成大量可供选择的模型。根据PTC官网介绍，三维模型场景下的创成式设计主要是通过设计师给定约束条件（包括空间、材料、制造方法、成本约束等）和目标，借助AI的能力来快速生成满足需求的目标模型，供设计师从中选择合适的模型进行进一步设计和优化，从而显著提升设计效率。我们观察到目前3D CAD中的AI应用主要分为两类：

► AI辅助参数优化：通常用于3D CAD模型的改进过程，基于CAE仿真结果（如部分零部件应力过大或形变明显），可以通过对其他部位添加约束，对拟优化零部件生成大量潜在参数并进行选择，最终得到优化的结果。

► AI实现草图生成：如Catia和Solidworks的Xdesign模块就引入了AI辅助创建草图功能，通过给定参数和材料得到系统给出的推荐的形状。其一定程度上能够帮助工程师进行底层几何图形，从而加快整体的设计进度。

3D CAD创成式设计基础是参数化建模。实际上参数化建模由来已久，1987年PTC公司发布的Pro/E首次引入了基于历史的参数化建模，至今主流3D CAD产品均有参数化建模功能。无论是AI辅助参数优化还是实现草图生成，其本质上都是基于给定的限制条件生成大量参数，进而基于这些参数生成设计方案供设计师选择。目前主流3D CAD产品如Catia、NX、Pro/E、Solidworks、SolidEdge等均具备AI模块，实现辅助设计功能。

EDA中的AI Inside：基于已有设计数据实现设计效率优化

AI赋能有望助力芯片设计实现真正的“自动化”。目前的EDA工具，即使是更加自动化的数字芯片设计流程中仍然需要大量设计师的人工操作场景，我们认为AI带来的自动化程度提升有望减少设计过程中的重复性劳动，进一步解放设计师的生产力。目前AI对EDA设计工具的赋能可以分为AI Inside和AI Outside两个层面：AI Inside一般指AI赋能相应的设计软件，让设计工具更加智能和高效；与之相对应的则是AI Outside，即为让机器通过学习来积累经验，从而一定程度上能够代替人工成为新的“生产力”。

芯片设计后端（尤其是布局布线）是AI Inside在EDA中的主要应用场景。在数字芯片设计流程中，设计后端最重要的布局布线环节涉及逻辑器件的物理形状和摆放方式，工程师需要考虑综合考虑网表图节点、网格粒度、布线密度等多重因素。因此布局布线通常是数据芯片设计中的高耗时环节，通过AI的图像识别和优化算法有望实现设计效率的显著提升。目前海外Cadence、Synopsys等EDA头部厂商均具备AI Inside赋能芯片设计的能力：

► Cadence：2020年3月Cadence发布了更新版数字全流程工具，通过iSpatial技术整合布局布线工具Innovus和前端的物理验证Genus工具实现打通，并集成机器学习技术，用户可用现有设计数据对iSpatial进行训练，实现布局布线流程中设计裕度的最小化。

► Synopsys：2020年Synopsys发布用于EDA的AI应用程序DSO.ai。根据公司官网介绍，设计空间优化（DSO）借助机器学习算法搜索大型设计空间，可用于优化芯片设计工作流程的输入参数和选择，以满足特定项目的确切需求[1]，我们认为其本质上类似3D CAD模型设计中的参数优化功能。

展望未来，AI Outside有望在更高层面实现真正的“芯片设计自动化”。与AI Inside赋能EDA工具的理念不同，AI Outside则更加关注工具使用者的维度，指EDA工具通过学习人类的设计模式并积累设计经验，最终达到减少人工干预和释放生产力的效果。目前Synopsys和Cadence在AI Outside助力实现设计自动化上均有所探索，我们认为现阶段实现AI Outside面临的主要阻力在于数据获取成本。AI Outside训练过程对芯片数据可靠性要求较高，而芯片设计公司的数据较难获取，我们认为EDA公司依靠和晶圆厂的绑定关系或有望通过工艺数据实现训练，逐步向AI Outside目标迈进。

创成式设计与GPT大模型的融合：从文字到模型的潜在路径

创成式设计和GPT大模型的融合畅想：文字描述参数化。我们认为GPT等大模型在3D模型设计方面仍然有较大的应用空间。未来的潜在的方向可能是借助ChatGPT的文字处理能力来理解设计师的文字需求，即为将文字描述理解和转化为一系列的模型参数，通过3D CAD创成式设计得到相应的模型设计方案。

► 创成式设计是当前已经存在的技术储备。目前3D模型的创成式设计已经能够实现参数优化和草图的生成，我们认为随着技术逐步完善，从给定参数到3D模型生成这一步骤或许不是从文本到模型的瓶颈。

► 文字到参数的转化是文生模型过程中的最大难点。目前的Transformer模型更加擅长场景是自然语言处理，我们认为将文本转化为设计师需要的参数是较大的难点，打通文本描述到参数描述的瓶颈有望为文本到模型的实现铺平道路。2021年Deepmind论文论述了图形和序列打通的可能性，借助Transformer模型自然语言处理能力实现CAD草图生成。

DeepMind借助Transformer模型自然语言处理能力实现草图绘制。草图设计是构成3D模型的骨架，其通过特定的约束来定义了实体如何在参数变换下保持原有的形状。DeepMind在2021年发表论文，论述了CAD草图绘制和自然语言建模的相似性，提出了能够自动生成CAD草图的机器学习模型，在无条件合成以及图像到草图的转换任务中表现良好。论文的亮点在于实现了图案和序列的对应，从而能够应用Transformer大模型实现对序列的处理。我们认为随着Transformer大模型应用逐渐深入，其与CAD融合应用或将持续推进，未来或将诞生基于文本实现更高级别模型生成的应用。
风险

技术进展不及预期：人工智能作为前沿新兴技术，仍处于技术的快速发展期，其进展有一定的不确定性，若技术进展不及预期，可能导致产业化进展缓慢。

商业化落地节奏不及预期：商业化落地是人工智能能否顺利走向下一阶段的关键点，若商业化落地节奏不及预期，对人工智能的进展将带来负面影响。

行业竞争加剧：人工智能是产业的热点，未来商业价值显著，科技巨头、初创公司均在此领域布局，未来垂类及应用层的行业竞争可能会进一步加剧。

编辑/irisz

Authors: Yu Zhonghai, Wang Zhihao, Wei Chuanfei, Han Rui, Hu Anqi, Tan Zhe Xian

Source: the finishing touch of Zhongjin

ChatGPT is a step closer to AGI, general artificial intelligence enabling application software becomes possible, which combined with tool software has a wide imagination space. For application software manufacturers, the short-term cost of accessing the AI large model is low, and the long-term imagination is large, so we observe that the vast majority of application software manufacturers actively access the large model. At present, the capability of AI large model represented by ChatGPT mainly lies in human-computer interaction AI and generative AI, and its combination with application software mainly covers AI+ tool software, AI+ search engine, AI+ service application, AI+ vertical industry application and so on. We believe that generative AI has a natural fit with tool software, and there is a broad space for downstream application scenarios and imagination.

Short-term dimension: the integration of generative AI improves production efficiency and becomes the new focus of tool software competition. At present, generative AI mainly helps users improve production efficiency by embedding existing tools and software. many manufacturers have participated in exploration and practice in the fields of text (such as Notion AI), pictures (such as Stable Diffusion, Midjourney), video (such as Make-A-Video), 3D model creation, audio and so on. We believe that, from the perspective of product value, the function of AI converged applications may become the incremental payment point of tool software; from the perspective of competition, the speed of new and traditional manufacturers born in AI to follow up the integration of AI applications will change the existing pattern. However, with the popularity of generative AI applications, AI fusion tools may become "standard" in the future, and the application depth of AI fusion scenarios will become a new focus of competition.

Long-term dimension: generative AI may reshape business logic and realize the transition from production tools to productivity. Ideally, we believe that in the future, the real AGI will be able to create without relying on the command and guidance of human users, and the tools and software enabled by AI may complete the transformation from the production tools to the productivity providers. at that time, the underlying AI capability providers and tool software vendors will jointly participate in the distribution of production value. In order to better understand AI's reshaping of business logic from a long-term perspective, we compare AGI with cloud computing from the perspectives of industrial structure, business logic, competition pattern and value sharing. We believe that just as "going to the cloud" has become a "compulsory course" for application software at present, "AI+" may also become standard for application software in the future, and bring a new round of value release.
Risk

The technological progress is not as expected, the commercial landing rhythm is not as expected, and the competition in the industry is intensified.

Text

AGI large model is getting better and better, creating AI depth enabling tool software

ChatGPT is one step closer to AGI, making it possible for general artificial intelligence enabling applications.

ChatGPT has set off an upsurge of global AI, and the road to AGI may be approaching. ChatGPT (Chat Generative Pre-Trained Transformer) is an artificial intelligence chat robot program developed by OpenAI. It is based on the GPT-3.5 model and can complete relatively complex language processing tasks, including man-machine dialogue, automatic text generation, automatic summary, coding and so on. It was launched in November 2022. Two months after its launch, the number of users reached 100 million, setting off another round of AI craze around the world. The popularity of ChatGPT makes the industry realize that the AI industry is a step closer on the road to AGI (General artificial Intelligence), which in turn leads to worldwide discussion and imagination on how AGI will reshape various industries in the future.

All kinds of application software manufacturers around the world actively embrace the new ecology of artificial intelligence represented by OpenAI. After the launch of ChatGPT, Microsoft Corp plans to invest an additional 10 billion US dollars in OpenAI and explore converged application scenarios in his search and office software. Due to the huge application potential and possibility of ChatGPT, the vast number of application manufacturers around the world have also begun to actively try to access the technical interface of OpenAI, in the hope that AI and its existing products can produce new chemical reactions. The domestic market also quickly followed up. Baidu, Inc. announced that Wenxin, its target product, would complete internal testing in March and be open to the public. at present, hundreds of domestic enterprises have announced access to Wenxin, including Hande Information, Kingdee, Yuxin and other enterprise service software manufacturers. At the same time, we expect that more large models will continue to appear at home and abroad to attract more application software manufacturers to enrich and expand the AI ecology.

For application software manufacturers, the short-term cost of accessing AI large model is lower, and the long-term imagination space is larger. At present, large models such as ChatGPT are in the initial stage of release, and the exploration of business model is just beginning. At this stage, the focus is on ecological construction rather than commercial realization, so no matter OpenAI, Baidu, Inc. or other large model manufacturers, they all keep an open attitude to the interface calls of application software vendors in the short term. This means that for application software vendors, the cost of accessing AI in a short period of time is not high, while AI is quite imaginative about the positive changes that their product form and business logic can bring in the long run.Therefore, we observe that the vast majority of application software manufacturers are actively accessing the ability of large models, and the number of related applications is expanding exponentially.。

At present, the ability of AI large model represented by ChatGPT mainly lies in human-computer interaction AI ability and generative AI ability, and its combination with application software mainly covers the following directions:

► AI+ tool software: assist text, picture, video production. The integrated application of AI and authoring tool software mainly gives full play to the generative AI ability of large LLM models such as ChatGPT, which can complete text generation, picture generation, video generation and other auxiliary creative tasks based on user input instructions and guidance. At present, typical application representatives include Notion AI and Office in text category (Microsoft Corp plans to access ChatGPT), Stable Diffusion (owned by Stability AI), Midjourney, DALL-E (owned by OpenAI), Imagen (owned by Alphabet Inc-CL C) and Designs.ai, Make-A-Video (owned by Meta), Lumen5 in video category.

► AI+ search engine: use natural language processing to transform traditional search clicks into interactive question-and-answer forms and generate personalized results. The combination of AI and search engine is mainly based on the human-computer interactive dialogue ability of natural language processing, to help users solve problems in the form of question and answer, and immediately generate personalized planning, suggestions, analysis and so on. Typical representatives include Microsoft Corp NewBing search engine, which adds interactive chat and auxiliary writing functions after being connected to ChatGPT.

► AI+ service applications: give full play to human-computer interaction to improve self-service experience. The combination of AI and service applications is mainly in the form of self-help question and answer chat robot, giving full play to the human-computer interaction ability of the LLM model. Typical representatives include intelligent question and answer and help robots in various service application scenarios, such as e-commerce, games, maps and so on.

► AI+ vertical industry applications: combined with existing vertical industry applications, it essentially belongs to the above three forms. Typical representatives include Yuxin, hand, Kingdee, Hang Seng Electronics, etc., which integrate the human-computer interaction and creation ability of large models to achieve more efficient information acquisition, analysis, and the formation of intelligent solutions. We believe that, in essence, it can also be classified as one of the above three forms. In the future, we need to further explore how to better combine with the vertical scene, and carry out more in-depth training for the industry corpus in order to achieve better results.

Chart: the main merging direction of AI large model and application software

The investment of application software manufacturers in the field of AI will focus more on the exploration of AI application scenarios and the integration with existing applications. From the perspective of the division of labor of the whole AI industry chain, we think that in the future, large model manufacturers will undertake most of the underlying algorithm development and optimization work, while application software manufacturers will focus more on the exploration and deep ploughing of application scenarios, as well as deeper integration with existing AI models. As to whether general artificial intelligence can be industrialized in the future, we think that both the advanced bottom model and the matching upper application are indispensable, and there will be a closer and reasonable division of labor and cooperation between AI manufacturers and application software manufacturers in the future.

In the above application directions and scenarios, we pay more attention to the possibility of combining generative AI with tools. In the application of large model represented by ChatGPT, generative AI is a more outstanding ability, and it has a natural fit with the existing tool software (text creation tools, picture creation tools, 3D model creation tools, etc.), and the downstream application scene is broad and imaginative. Therefore, in this report, we will focus on the enabling of generative AI to tools and the possibility of long-term reshaping the underlying business logic and industrial ecology of tools.

What possibilities will the generative AI enabler software create for it? From a short-term perspective, generative AI is mainly embedded in existing tools and software as an innovative auxiliary function to help users improve their productivity, and manufacturers can charge extra as a value-added service. But in the long run, if the future generative AI can achieve active creation that does not depend on user guidance, it is expected to transform from production tools to productive forces, in a real sense to replace part of the work of "creators".Therefore, our attitude towards generative AI is conservative in the short term and not underestimated in the long run.
Chart: a list of the applications of language models such as OpenAI and Wen Xin Yiyan that have been or are planned to be connected worldwide.

Short-term dimension: merging generative AI to improve production efficiency and become a new focus of tool software competition

At present, generative AI mainly helps users improve production efficiency by embedding existing tools and software. After the integration of the tool software into the creative AI, it can carry out auxiliary creation under the framework, instructions and guidance specified by the user, so as to help users reduce repetitive, mechanical, rule-oriented work, and even undertake some creative work. For example, based on guidelines, collect and induce text creation on the basis of existing corpus, generate pictures and videos based on text description, and assist in parameter optimization in 3D model creation.At present, many manufacturers have chosen to participate in exploration and practice in many modal fields, such as text, 2D pictures, 3D models, audio, video, and so on.

► generative AI and text creation: overseas manufacturers such as Notion have built-in AI writing assistant to automatically generate text content in different application scenarios according to user description, and Microsoft Corp also plans to access ChatGPT capability in Office; domestic manufacturers such as Kingsoft Office's WPS can achieve document proofreading, full-text translation and auxiliary writing and other functions. In addition to C-end applications, there are also manufacturers who have developed AI-aided text creation products specifically for enterprises. The typical representative is the formula under the fourth paradigm, which can integrate large-scale generative language models such as GPT with vertical domain knowledge within the enterprise, and guarantee private deployment at the same time, so as to meet the requirements of enterprise applications for vertical industry knowledge, data security, content credibility and so on.

► generative AI and picture creation: many overseas companies have launched text-based image generation products, including DALL E 2 of OpenAI, Stable Diffusion of Stability AI, Midjourney and so on. The operation process is mostly similar. Enter keywords to generate multiple AI painting content, and support further modification and addition of details. Different manufacturers have different styles in generating pictures. DALL-E2 is realistic, Midjourney is science fiction, and Stable Diffusion has no style. You can try to adjust it many times according to the detailed instructions. Domestic manufacturers have also followed up, such as Tiangong Qiao painting under Kunlun Wanwei and Wanxing painting under Wanxing Technology, and so on.

► generative AI and audio creation: overseas manufacturers such as Alphabet Inc-CL C released AudioLM in October last year, which can generate similar style audio based on the input audio clips, and in January this year launched MusicLM, which can generate music directly from text and images. Microsoft Corp also released VALL-E in January this year, which can imitate people's speech and copy the speaker's mood and tone. In addition, there are Dance Diffusion under Stability AI and Jukebox under Open AI. Some domestic manufacturers have also followed suit, including iFLYTEK dubbing, Baidu, Inc. speech synthesis and Tencent Zhiying.

► generative AI and video creation: overseas manufacturers such as Make-A-Video under Meta support video generation based on text description; Alphabet Inc-CL C's Imagen Video and Phenaki support video creation with different image quality and length requirements respectively, and in early February it released a new video editing method, Dreamix, which can edit existing videos and generate videos by providing pictures and descriptions; in addition, there is AI video generation model GEN-1 launched by Runway. Domestic manufacturers have also tried, such as Wanxing's Wanxing broadcast support to generate digital promotional videos based on keywords, VidPress incubated by Baidu, Inc. to support automatic production of dubbing, subtitles and picture video content after importing images, as well as Danghong Technology's picture quality incremental AI products and Shangtang Zhiying's auxiliary intelligent script creation products.

► generative AI and 3D model creation: 3D CAD products such as Creo, Autodesk Fusion360, Solid Edge and Solidworks have been widely integrated with AI Inside application capabilities, which are mainly used to assist in parameter optimization and sketch generation. In the field of EDA, overseas EDA manufacturers such as Synopsys and Cadence have explored the design of AI enabling chips to achieve higher design efficiency through existing design data training models.

The integration of tools and AI can optimize user experience and production efficiency, and enhance the competitiveness of products. Whether from the point of view of bringing users a "sense of novelty" or from the perspective of improving user productivity, access to AI is a better choice for tools to enhance the attractiveness and competitiveness of products. At the same time, due to the low trial and error cost of short-term access to the large model, we judge that the majority of tool software manufacturers will have an open attitude to the access of related capabilities, and the industrial ecology is expected to grow rapidly.

But from an objective point of view, the current generative AI still has many shortcomings, mainly as an auxiliary production tool. At present, the generative AI represented by ChatGPT still has many shortcomings, such as lack of industry-specific corpus training, corpus lag, unable to guarantee the correctness of logical reasoning, so it only appears as an auxiliary production tool in the short term, and does not have the ability to produce and create on its own initiative. And in the process of use, users also need to pay extra attention to possible copyright disputes, sensitive information, prejudice and discrimination. We believe that the integration of generative AI and application software is still in its infancy and has broad room for improvement.

What will be the impact on the industry ecology and business pattern in the short term after AI enabling tools?

From the point of view of product value, the function of AI converged application may become the incremental payment point of tool software. In the short term, tool software manufacturers can use the converged applications with AI as differentiated function points and value-added services, charge users incrementally, and then open the product payment ceiling. For example, Teams Premium released by Microsoft Corp costs US $10 per month to subscribe to GPT-3.5-based automatic generation of meeting notes and other functions; Copilot, also an auxiliary code generation and modification application under Microsoft Corp, is also charged extra; Notion is currently free of charge for alpha testing of AI enhancements, but officials say there will be a high probability charge for future official versions.Chart: AI enhancements may become an incremental paypoint for tool software, further opening the product revenue ceiling

From the perspective of competition, the speed of new manufacturers and traditional manufacturers born in AI to follow up the integration of AI applications will change the existing pattern. We regard AGI as a new technological revolution, which may have an impact on the traditional industrial structure. Analogy cloud computing era, Salesforce.com Inc and other emerging SaaS manufacturers to grasp the "cloud" new trend, the sudden rise of "overtaking" SAP and other established software manufacturers; and Oracle, Microsoft Corp and other traditional manufacturers of the cloud transformation results also directly affect the evolution trend of their market influence. In fact, there are already a number of AIGC-related unicorns developing rapidly. In the future era of AI converged applications, we think that the emergence of emerging manufacturers originating from AI and the transformation effect of traditional manufacturers AI may change the existing competition pattern.

Chart: AIGC-related unicorns are growing rapidly or changing the existing landscape

However, with the popularity of generative AI applications, AI fusion tools may become "standard" in the future. Because tool software manufacturers do not need to invest in the development of large AI models, but only focus on the practice and adaptation of AI integration applications, the early cost is not high, so we judge that if the early tool software manufacturers achieve commercial success through the integration of AI, other participants in the industry will quickly follow, and AI integration tools may become "standard". Under the circumstances, we believe thatTool software vendors may not be able to charge for AI enhancements separately, and the difference in competition between vendors will change from "with or without AI enhancements" to "whether they can make good use of AI".

In the future, the application depth of AI fusion scenario will become a new focus of competition among tool software manufacturers. When AI converged applications become the "standard" of tool software manufacturers, the focus of competition among manufacturers will be on how to explore application scenarios that are more suitable for AI and maximize the effectiveness of generative AI. On the premise that they can also access the AI general large model capability, we believe that in the future, manufacturers who can better integrate AI with existing application scenarios and give greater play to the value of AI are expected to win in the new round of competition, and the existing fixed competition pattern in some areas may also be impacted or even subverted.

Long-term dimension: generative AI may reshape business logic and realize the transition from production tools to productivity

Ideally, AGI can upgrade production tools to productivity and reshape the underlying business logic of tool software. In the long run, the application of AGI (general artificial intelligence) fusion tool software has more imagination, and there is no lack of the viewpoint that general artificial intelligence is compared to a new "industrial revolution" and "technological singularity" in the industry. Ideally, we believe that the real AGI in the future will be able to create without relying on the commands and guidance of human users. At that time, the tool software that integrates the independent creative ability of AGI will no longer be just a "production tool" to assist human users to improve efficiency, but will become an independent incremental "productivity".

After becoming productivity, the tool software enabled by AI should directly participate in the distribution of production value, and the production value should be shared by the underlying AI capability provider and the tool software manufacturer. We believe that in the future, if the tool software enabled by AI can complete the transformation from the production tool provider to the productivity provider, its business logic will no longer charge the tool indirectly, but should directly participate in the distribution of production value, such as a book written entirely by AI-enabled text creation software. Both the underlying general AI capability provider and the text authoring tool software provider are entitled to a share of book sales.

Chart: generative AI upgrades production tools to productivity, bringing about a qualitative change in business logic

In the short term, the downstream vendors with scarce AI fusion scenarios are more critical; in the long run, the bargaining power will be transferred to the platform vendors who master the underlying general AI capabilities. In the early stage of AGI exploration, suitable downstream application scenarios are scarce, and the underlying general AI platform vendors hope to access as many application vendors as possible, so as to get more opportunities to train large models in vertical application scenarios. But in the long run, due to the high technical and cost requirements of training large models, and with the gradual deepening of AGI applications, we think that the final bargaining power may be transferred to a small number of platform manufacturers with underlying general AI capabilities, which are expected to get a higher proportion in value distribution. However, no matter which proportion of the final value distribution is high or low, we believe that in this process, the business logic of tool software manufacturers has undergone a qualitative change-that is, it is possible to be directly involved in the process of sharing production value.

Chart: ideally, AGI brings about logical changes in value distribution of tool software.

How to better understand AI's reshaping of business logic from a long-term perspective? We compare it with the SaaS model brought about by cloud computing. We believe that AI and cloud computing are both epoch-making technological changes. Cloud computing has created SaaS, a new business model and changed the competition pattern of traditional enterprise service software. Therefore, we compare AGI with cloud computing in terms of industrial structure and business logic, and discuss its possible business impact.

From the perspective of ► industrial structure, the computing power, model and AI convergence applications in AI correspond to IaaS, PaaS and SaaS in cloud computing respectively. We believe that, similar to the three-tier industrial structure of cloud computing, the training of the AI model needs the strong hardware support of the underlying layer, the computing layer is the corresponding IaaS layer in cloud computing, while the large AI model is similar to the basic software and bears the general requirements. At the same time, the interface of the large model is also trying to pay by quantity, and MaaS (Model-as-a-Service) is the corresponding PaaS layer in cloud computing. The top application software calls the AI model to provide enterprises and consumers with the vertical scenario function after integrating AI capabilities, that is, the SaaS software that provides services based on the underlying cloud computing infrastructure and platform capabilities.

Chart: computing power, model and AI converged applications in AI can correspond to IaaS, PaaS and SaaS in cloud computing respectively.

From the perspective of ► business logic, cloud computing has changed from selling products to subscribing services, and AGI is expected to bring about a change in the payment for the use of production tools to the direct participation of productivity in value distribution. Cloud computing enables customers to shift from an one-off buy-out of basic hardware and software products to continuous payment to enjoy the services provided by cloud vendors. Subscription system means better cash flow and revenue sustainability for suppliers, as well as higher total customer payments. As we discussed earlier, if the tool software enabled by AI in the future can complete the transformation from the supply of production tools to the provider of productivity, its business logic will shift from charging fees for the use of tools to directly participating in the distribution of production value, which also means better revenue sustainability and higher income ceiling for suppliers.

From the perspective of ► competition pattern, the entry of new manufacturers and the degree of adaptation of traditional manufacturers to new technologies have changed the existing pattern. Taking the market pattern of database basic software as an example, the changes in the market pattern in the past decade are mainly affected by the entry of cloud vendors and cloud native independent database manufacturers, as well as the effectiveness of cloud transformation of traditional database enterprises. From an analogical point of view, we believe that the entry of new tool software vendors originating from AI in the future, and the speed and ability of existing manufacturers to integrate AI may also reshape the market competition pattern.

From the perspective of ► value sharing, the underlying infrastructure vendors provide general capabilities, while the upper application vendors focus on the vertical scenario. In the cloud computing industry chain, IaaS and PaAS layer vendors provide general software and hardware infrastructure capabilities, while SaaS layer manufacturers focus on providing vertical functional applications. Analogically speaking, AI underlying platform vendors provide general large model capabilities, while upstream tool software vendors look for landing scenarios that are suitable for AI empowerment and realization. In terms of the computing cost required by AI, we believe that AI vendors will bear the training costs, while the subsequent reasoning costs will be shared by AI vendors and application software vendors (similar to cloud computing leasing cloud computing resources, the future AI industry will lease models and computing power).

Chart: in the long run, AI is expected to bring about the business logic reshaping of tool software like cloud computing.

"going to the cloud" has become a "compulsory course" for application software, and we think that "AI+" may also become a standard part of application software in the future. At present, supporting cloud deployment has basically become a necessary capability for software manufacturers. Most software companies established after 2010 have chosen the cloud-native technology route, while traditional software companies have also actively turned to the cloud. And the business model is also transformed to a subscription system. From the perspective of the integration of application software to AI, we think that "AI+" is expected to become the standard of the new generation of application software, and application software manufacturers will also form a new set of mature business model in the exploration and running-in with AI manufacturers.

After the business model is reshaped, cloud computing promotes the revaluation of applications, and AGI may also bring a new round of value release in the future. Cloud computing promotes the upgrading of business model and business logic through the change of software development, deployment, delivery and charging, which in turn leads to the revaluation of tools and even the whole application software industry in the capital market. We believe that in the long run, future generative AI enabler software may bring a new round of value release. However, in the short term, because there are still many defects in the current large model, downstream applications and incremental payment scenarios are still being explored, and there is also a need for further discussion on copyright and laws and regulations. therefore, our above conjecture still has a lot of uncertainty in the future evolution direction and needs to be followed and observed continuously.

To sum up, the AI fusion tool software has a wide imagination, but there are still many challenges in the actual landing. We emphasize the point of view that it is not exaggerated in the short term and not underestimated in the long term. The imagination of AI fusion tool software is wide, but the final implementation still depends on the underlying computing power and the evolution and iteration of large model algorithms. at the same time, there are still legal and ethical issues to be discussed and solved. We believe that the future of AGI application is bright, but the road is tortuous. We emphasize the view that it is not exaggerated in the short term and not underestimated in the long run, and suggest that investors should continue to follow the latest industrial trends and pay attention to the possible application scenarios of AI fusion tools.

Chart: AIGC continues to make breakthroughs in key technologies, and AI fusion tools have a wide imagination. We emphasize that we do not exaggerate in the short term and do not underestimate in the long run.

Industrial practice and Application trend of generative AI enabling tool Software

Generative AI and text creation: ChatGPT is expected to accelerate the landing of AI text creation

Generative AI can complete the functions of writing, rewriting, correction, translation and so on in the text creation scene. AI can train text creation tools with the help of extensive text data on the Internet. at present, the application ability of Transformer large model in natural language scenes has been relatively mature, and we think that text creation is expected to become a fast landing application scene of generative AI. We have observed that Notion and Microsoft Corp have begun to integrate AI language models into notes and office software; the fourth paradigm has also launched AIGC tools for enterprise customers. Jinshan Office, the leader of office software, is also expected to achieve AI empowerment and improve the efficiency of text creation in the medium to long term. We believe that generative AI can achieve four major capabilities in text creation scenarios:

► writing: based on a massive corpus, Transformer neural network has the ability of language understanding and text generation, so it can generate logically coherent and fact-rich segments according to the user's simple instructions.

► rewriting: compared with ordinary language models, large language models have certain reasoning capabilities and can form thinking chains to solve abstract problems, so they can complete text rewriting tasks according to user requirements.

► correction: by comparing learning and summarizing rules in massive text data, generative AI can correct the spelling, grammar, punctuation and other errors of the given text, making the modified text more in line with the common language paradigm.

► translation: generative AI can use cyclic neural network and convolutional neural network to disassemble complex segments and translate them in context, thus greatly improving the integrity, accuracy and readability of translation.Chart: four abilities of generative AI in text creation scenes

Case 1:Notion AI optimizes text creation

Notion AI can generate rich text content based on simple instructions. Notion AI is an artificial intelligence tool for Notion products, which helps users improve the efficiency and experience of text creation by integrating machine learning and NLP technology. Under the AI large-scale language model, users only need to list the basic needs, and the product can automatically generate rich text content, which covers a variety of scenarios, such as meeting agenda, sales email, press releases and so on. Notion AI also has summary, error correction, translation, continuation, brainstorming and other functions; Notion AI will also become the interface of Notion knowledge base, users only need to enter search requirements, Notion AI will automatically present relevant information. We expect that Notion AI's automatic text generation, text summary, text editing and other functions may greatly optimize the user's creative process and user experience, and help Notion's product power to leap forward.Case 2: Microsoft Corp's integration plan of AI and Office

With AI enabled, Microsoft Corp Office is expected to optimize the product experience. Microsoft Corp invested 1 billion US dollars in OpenAI in 2019 and established a more in-depth cooperative relationship with it. Recently, Microsoft Corp plans to integrate the next generation language model of OpenAI into Word, PowerPoint, Outlook and other applications in Office office software. Users only need to enter simple instructions to obtain automatically generated text content. The new version of Office will have automatic summarization, content suggestions, and text generation capabilities to provide an experience similar to the Bing-ChatGPT sidebar, where users can interact with chatbots.

The large number of users and training data are expected to contribute to the rapid iteration of Office AI application capabilities. The advantage of user scale of Office office software is obvious (there are 1.5 billion sets of PC version installed worldwide in 21 years). We believe that the integration of OpenAI's artificial intelligence technology and Office software on the one hand can enable AI to find a high-quality landing scene; on the other hand, the huge user scale of Office software is expected to provide AI with a steady stream of massive training data, thus forming a flywheel effect and constantly improving AI's text creation experience.

Case 3: mold force tables provide AI word processing applications embedded in table scenarios

The mold force table realizes the "batch calculation" of the text content of the table through the large model of AI. The mold force table is jointly developed by the face wall intelligence company and the large model open source community OpenBMB (the main members are from Tsinghua University). It embeds the word processing ability of the AI model into the function, and the model can be called by entering the function in the table. Currently supported functions include IE (information extraction), QA (question and answer), MT (translation), SA (emotion analysis), TG (title generation), etc. It also supports integration with Excel basic functions. We think that through the AI word processing application in the table, we can realize the text batch calculation and greatly improve the office efficiency.

Chart: mold force table to realize AI word processing ability in table scene

Case 4: the fourth paradigm meets the AIGC requirements of the enterprise scenario

The fourth paradigm introduces the "style theory" of enterprise-level GPT products, which helps enterprises to use internal knowledge to solve problems. The fourth paradigm aims to solve the limitations of large-scale generative language models in internal enterprise scenarios and meet the needs of AIGC in enterprise scenarios by combining GPT-like language models with vertical domain knowledge. The "style theory" focuses on three major product features: 1) data security, which solves the concerns of enterprise customers about data security through privatization deployment; 2) content credibility, which is based on the enterprise's internal database, and marks the original source of the information when providing the answer, which increases the credibility and reliability of the answer; 3) the cost is controllable, and the calculation cost is relatively controllable, and the demand for data labeling is small. We believe that AIGC tools such as "Shi Shuo", which serve B-end customers, can help to realize enterprise knowledge reuse and improve the efficiency of enterprise production and management.

Chart: the fourth paradigm "style theory" product work interface

Case 5: Bamboo Intelligence enables Writing & Dialogue & knowledge search and other scenarios with the help of AIGC

Bamboo Intelligence launched ChatGPT-like products, enabling enterprise-level AIGC applications. Founded in 2015, the company provides AI enabling solutions for finance, enterprises, health care, manufacturing, intelligent terminals and government affairs. In September 2022, the company launched AI SaaS products, covering customer service, sales services, internal services and other scenarios to provide cloud AI tools for small and medium-sized enterprises. In the field of AIGC, the company has also continued to dig deeply, previously launched a number of intelligent creative writing software such as Magic Writer, and recently launched an enterprise-level Gemini GPT product series, including enterprise dialogue robot KKBot, interactive cognitive search engine ChatSearch, with the help of AI to achieve comprehensive empowerment in sales customer service, human-computer interaction, knowledge exploration and so on.

Case 6: impression notes assist text creation with the help of lightweight large models developed by ourselves.

Based on the self-developed "Elephant GPT" model, the "impression AI" generative text tool is launched. Since 2019, domestic note application manufacturers have been impressed by the AI application scene of note-taking AI in note-taking word processing, and have launched AI tools such as intelligent recommendation, smart tag, intelligent summary, knowledge star map and so on. Impression note at the same time continue to invest in large model research and development, in 2023 launched a combination of OPT, BLOOM and other GPT-3.5 structure large language model independent research and construction of the big language model "Elephant GPT", and based on this launched the "impression AI" generative text tool module embedded in their own notes products, to achieve domestic manufacturers through self-research model to achieve AI text creation of the first application. The future impression note plan uses reinforcement learning (RLHF) based on human feedback to optimize the model, and plans to combine with private corpus to enable personal style writing.

Case 7:Minimax opens a new scene of C-terminal landing.

Different from ChatGPT's professional knowledge Q & A, MiniMax launched Glow's main chat and social function. Founded at the end of 2021, the company has developed a general large model from text to vision, text to voice, and text to text. In November 2022, MiniMax launched the first AI dialogue robot platform Glow, in which users can choose existing agents for dialogue, or create agents through a brief description and optimize them in subsequent conversations. The dialogue generation, portrait generation and timbre generation of the agents invoke the capabilities of the three modal models of MiniMax. Different from the ChatGPT chat robot, which tends to question search, text generation and other functions, the agents generated by Glow have different backgrounds and personality settings, and the content of conversation with users also tends to chat with company, emotional interaction, plot interpretation. We believe that the chat robot of MiniMax has a good interaction effect with users and has strong user stickiness, so it opens a new scene of C-terminal landing.

Case 8: potential AI application scenario of Jinshan Office

Jinshan Office has a solid layout in the field of AI. Jinshan Office, the leader of domestic office software, also has a wide range of technology and business layout in AI fields such as computer vision, natural language processing, voice processing and so on. Since 2017, the company has started to build the AI center, and has developed nearly 100 AI capabilities around the office field. In the field of natural language processing, Jinshan Office has developed an auxiliary writing function. Users only need to provide an outline, AI can automatically generate text based on corpus algorithm, and users can use the text generated by AI as manuscript, which greatly improves the writing efficiency. In addition, Jinshan Office has also achieved AI proofreading, translation, error correction and other functions, and regard it as an important incremental function of WPS office software suite.

We judge that Jinshan Office will follow the AI industry trend at the same time, cut into the follow-up at the right time. We judge that Jinshan Office will focus on the AI application side. The company's existing product WPS has accumulated a large number of users, diverse user scenes and high complexity. We believe that if Kingsoft Office can dig user scenes deeply, it will be able to provide corresponding AI text creation services in email, office, marketing, government affairs, literature and other subdivided scenes to enhance user experience and deepen the product moat. In the future, we judge that the company will try to access the application timely after fully considering the capabilities of domestic AI model manufacturers, so as to give full play to the application potential of AI model in the field of office software as much as possible.

Generative AI and Audio Generation: cross-modal applications enter the Audio Industry

Overseas case 1: different teams of Alphabet Inc-CL C have audio generation research results.

Alphabet Inc-CL C released different audio generation models in 2023, and they have their own characteristics. There have been attempts to create music by AI before, such as the visual music creation model Riffusion, the AudioML released by Alphabet Inc-CL C and the Jukebox launched by OpenAI. The current research results are based on the Diffusion model, tagged audio data, through the extraction of data features, text and audio pairing to achieve the text to generate audio.

► MusicLM: this is a model for generating high-fidelity music from text descriptions. For example, users can enter "calm violin melodies with distorted guitar improvisation". MusicLM converts the conditional music generation process into a hierarchical Seq-to-Seq modeling task, and can maintain the frequency of 24 kHz to generate a few minutes of music, both text description and audio quality are better than the previous model. In addition, MusicLM can also transform the original melody based on the text description and generate the corresponding music accompaniment according to the picture painting and text description.

► Noise2Music: continuously apply the Diffusion model to generate 24kHZ audio clips, use two depth models to pseudo-mark large pseudo-label audio data sets to generate training sets, and the big prediction model to generate music descriptive text, embed the pre-trained music-text joint model, and assign the corresponding text to the audio through zero-shot classification. Noise2Music can understand the more complex semantics of prompt and generate different styles, such as "an alto sings a slow jazz ballad in a live performance", or imitates different instruments such as piano, saxophone, African drums, etc.

► SingSong: this model can automatically generate accompaniment according to human voice, and its technical basis is based on sound source separation and audio generation of human voice. Users only need to input their human voice to get the corresponding musical instrument accompaniment. The researchers gathered a group of listeners to evaluate the effectiveness of the model, showing two 10-second accompaniment audio with the same voice, and SingSong received significantly better feedback than other baseline models.

Overseas case 2: British academic institutions propose AudioLDM to improve quality and optimize computing power consumption

The AudioLDM model solves the problems of limited quality and high computational cost in the research of "text to audio". The University of Surrey and Imperial College in the UK have jointly released and opened up a framework based on denoising diffusion implicit model and comparative learning: AudioLDM. The model improves the quality of audio generated by text; only text data is needed in the training process to achieve an equivalent or even better effect than using audio-text; in addition, the computational resource consumption of model training is low. and the sound style can be transformed or imitated without extra training.

Domestic case 1: iFLYTEK launches a new training framework to optimize voice prosody

IFLYTEK launched the SMART-TTS framework and launched Xunfei open platform, iFLYTEK audio and learning power. SMART-TTS does not directly learn the mapping of text and audio features, but through modular disassembly of the speech synthesis learning process, pre-training to strengthen each module. The framework can provide 11 kinds of emotions, such as "happy, sorry, sad", each emotion has 20 levels of strong and weak adjustment, and can also provide pause, stress and speed of voice, which can realize the feelings expressed by real people in the voice of digital people. In addition, iFLYTEK's speech synthesis supports 37 languages, 11 dialects, 2 national languages, as well as Chinese-English mixed natural synthesis.

Domestic case 2: domestic AI voice generation "unicorn" Yun Zhisheng

In addition to text-generated music, speech synthesis is also an important direction of audio generation. The domestic "unicorn" Yunzhisheng provides speech synthesis products and services, including text-to-speech synthesis, voice library customization and voice cloning. Among them, speech synthesis can convert text into natural and smooth speech, provide more timbre, different emotions, and provide functions such as adjusting volume, speed and pitch; audio library customization is mainly for enterprise customers, providing customized sound library services to generate exclusive IP pronunciation through in-depth learning; sound cloning can quickly obtain sound models with similar timbre and pronunciation style by recording a small number of user voices. These functions are suitable for intelligent customer service, intelligent hardware, news broadcasting, self-media dubbing and other audio scenes.

Generative AI and Picture creation: cross-modal brings rich imagination

In 2022, with the birth and open source of CLIP and Diffusion models, the landing of DALL ·E 2 and Stable Diffusion models was further promoted, and cross-modal generation such as text generation and image generation became the main line of AIGC landing. After OpenAI has the foundation of a large model, a large amount of corresponding data of graphics and text in the open source database, the computing power support of the head manufacturer and the lowering of the threshold, it releases an upgraded version of the "Wensheng diagram" model DALL ·E 2, which pushes AI painting (text cross-modal image generation) to the ground, setting off a wave of AI painting. In August 2022, Stability AI open source Stable Diffusion model marked a significant reduction in the threshold of cross-modal applications of AIGC in the field of AI painting, and opened the era of "industrial production" of national creation. On this basis, the overseas application layer gives birth to fine-tuning models and plug-ins such as Midjourney, ChilloutMix and Controlnet, which continuously improve the quality of generated images and gradually promote the commercialization of AI image creation.

Overseas case 1: DALL E and DALL E 2, the founders of Wen Sheng Tu

DALL ·E was first launched by OpenAI and began commercializing its technology through Azure OpenAI services in 2021, and an upgraded version of DALL ·E 2 was released in April 2022. With the GPT-3-based image text matching model CLIP released by OpenAI in 2021, DALL ·E 2 has the ability to connect text and visual images. Through the Diffusion-based image generation model GLIDE, DALL ·E 2 can generate realistic images according to the text, the resolution is improved by 4 times, the accuracy is higher, and the business is wider. It has three functions: 1) generate an image according to the text prompt, 2) generate a new image from a given image, and 3) edit image elements with text.

DALL ·E 2 currently adopts the business model of paid purchase times: after joining the Open Beta program, there are 50 free points in the first month, each point corresponds to a drawing, and then 15 points are added free of charge each month. The current price is $15. 115 points. Compared with DALL E, DALL E 2 can not only generate a more real and accurate image, but also express the scene more completely and edit the existing image through natural language description. Compared with other models in this field, DALL ·E 2 has higher controllability, excellent spatial structure relationship processing and strong image simulation. The technology of DALL ·E 2 is mature and takes the lead in bringing AI painting from imagination to reality. In July 2022, DALL E 2 launched an invitation public test, which is an important driving force for the heat rise of AIGC in 2022.

Overseas case 2:Stability AI open source Stable Diffusion, painting with AI for export

Stability AI was founded in 2020, with the underlying ability to launch and open source Stable Diffusion in 2022, with a post-investment valuation of more than $1 billion, and was promoted to a unicorn in the seed round financing stage. Stable Diffusion is mainly based on the latent diffusion model (Latent Diffusion Model), generates images through iterative "denoising" input and decoding output, and uses spatial dimensionality reduction to solve the pain points of memory and model reasoning, which not only enables users to quickly generate high-resolution and high-definition images on consumer-grade graphics cards, but also establishes an open source ecology, which greatly reduces the threshold for users. At this point, the open source ecology promotes the initial solution of the data, model and computing problems of AIGC, which directly lowers the threshold for users and penetrates into many vertical fields.

Overseas case 3: successfully realized business model, AI mapping phenomenal application Midjourney

Midjourney has built a closed-source "Vincent diagram" model based on CLIP and Diffusion, which has achieved 10 million users and more than $100m in revenue. The product is located in the Discord community, and users generate the desired image by inviting the Midjourney robot to the channel and typing the prompt that starts with "/ image". Midjourney has more than 10 million community members and gets feedback through users' choice of generated results, resulting in a large and unique data set that sets up barriers to competition. The pictures generated by Midjourney require short prompt, high quality, sci-fi color, and are loved by designers, Web3 & NFT practitioners and individual users. The SaaS paid business model has been used to make a profit.

Compared with the overseas cutting-edge technology, the domestic AI picture creation is relatively early, but the corresponding results have also made some progress, the emergence of a number of innovative products and technologies. Among them, represented by Baidu, Inc. 's Wen Xin Yi style and Wanxing Technology's Wanxing painting, it not only shows the domestic ability of artificial intelligence painting, but also innovates and develops "AI simple pen drawing", which expands the interactive way of creation and improves the efficiency and experience of users.

Domestic case 1: Baidu, Inc. based on Wen Xin big model, AI painting ability to mark overseas

Wen Xin Yi GE is the first AI painting product launched by Baidu, Inc. relying on flying oars and Wen Xin big model. The product supports text generation of more than ten different styles of images, such as national style, oil painting, watercolor, gouache, animation, realism, etc., providing a creative platform for professional content creators and providing possibilities for entry-level users and public users to achieve imagination. Faced with the triple challenges of application landing: understanding of creative needs, original generation of images and satisfaction of creative needs, Wenxin Yige carried out three major technological innovations, namely, knowledge-based prompt learning, text cross-mode deep fusion and text-driven image editing, realizing creative planning, detail description ability and multi-round interaction to improve the quality.

Domestic case 2: Wanxing Technology ploughs AIGC painting, OpenAI empowers domestic manufacturers' case benchmark

Wanxing Technology has been engaged in overseas business for 20 years and connected to OpenAI's API to create a new creative artifact for the creative field of drawing: Wanxing Ai painting. Wan Xingai painting is located in the professional creation of "AI to generate high-quality works of art", providing random generation and keyword creation of two AI painting modes, users can enter keywords, choose picture proportion and art style, and the paintings generated by AI can be obtained in 30 seconds, and the works support a variety of art styles, such as hand painting, cyberpunk, QQ, CG digital rendering and so on. And the product supports bilingual creation in both Chinese and English, emphasizing key words through exclamation points and parentheses.

In February 2023, Wanxingai painting took the lead in launching "AI sketches" in the industry. The product became the world's first AI painting software through user interaction and "picture", marking a new era for Wanxing painting to help AI painting enter a new era. Compared with the previous methods of painting, simple strokes require less prompt for users, and now you can generate high-quality art paintings in 5 seconds with just a few strokes; users can also iteratively upgrade the model through image selection feedback. Through the sketch "picture", users have a more sense of participation in the creation, and the process is more interesting.Chart: Wanxing "AI painting" creation interface

Generative AI and video authoring: the cross-modal step is still in its early stages and is expected to open the application ceiling

The benchmarking case of overseas technology giants opens up the imagination of AI video creation. In September 2022, Meta released Make-A-Video to generate video from text, which can generate short videos in seconds based on a few words or sentences. Only a week later, Alphabet Inc-CL C released Imagen Video and Phenaki, which are aimed at generating high-quality and long-term videos, respectively. At present, there are still some shortcomings in the field of cross-modal video generation in AIGC. The video generated by AI has obvious shortcomings, such as blurring and distortion of objects, and can not generate longer scenes to tell stories in detail and coherently. However, we believe that AIGC video generation is expected to achieve a breakthrough in technology and open the application ceiling.

Case 1:Make-A-Video realizes cross-modal generation between text and video

Make-A-Video can generate video based on text. Make-A-Video is a further upgrade of the text-generated image model Make-A-Scene released by Meta in July 2022. A few seconds of video can be generated by inputting text into Make-A-Video, supporting different video styles. In addition to text-generated video, Make-A-Video can also input single or two images to create motion, that is, image-generated video.

Case 2: Alphabet Inc-CL C continues to produce results in the field of cross-modal video generation

Alphabet Inc-CL C dabbled in both text-generated video and image-generated video. A week after Meta launched Make-A-Video, Alphabet Inc-CL C launched Imagen Video and Phenaki, in which Imagen Video has higher picture quality but shorter video generation time, while Phenaki generates video of poor quality but can generate more than 2 minutes of video. In November 2022, Alphabet Inc-CL C released a video that combines the two for the first time, taking into account both quality and length. On February 2, 2023, Alphabet Inc-CL C proposed a new video editing method, Dreamix, which can edit existing videos and generate videos by providing pictures and descriptions.

The GEN-1 model introduced by case 3:Runway is superior in generating video quality.

The video styles generated by the GEN-1 model are diverse. Runway was founded in 2018 and is one of the co-publishers of Stable Diffusion. In February 2023, Runway launched GEN-1, an AI video generation model, to synthesize a new video by applying the composition and style of image or text prompts to the structure of the source video, thus taking a step forward in the quality and length of the generated video.

Domestic manufacturers: also in the early stage of exploration, to assist in the improvement of creative efficiency

Domestic manufacturers are also in the early stage of exploration in the field of video generation. Domestic manufacturers' application of AIGC technology in the video field is more focused on video content creation and quality upgrading, realizing video attribute change and "pipelined" content creation. At present, it is mostly used in B-end to provide production efficiency improvement for content creators.

► text generation video: in May 2022, the Joint Zhiyuan Research Institute of Tsinghua University released the CogVideo model based on Transformer architecture. This model is the first open source text generation video AI model in the industry, but the resolution of the generated video is low, the length is limited, and currently only supports Chinese input.

► image quality enhancement and restoration: Donghong technology has been more mature in picture quality enhancement products, including video frame insertion, video detail enhancement, video picture quality enhancement, old image restoration and coloring, etc.

Automatic creation of ► video: VidPress, an intelligent video creation tool incubated by Baidu, Inc., supports automatic video content production of dubbing, subtitles and pictures after importing picture and text links. It has provided intelligent video generation function for end users of People's Daily and other media organizations, hundred accounts and good-looking video platforms.

► intelligent script creation: "Video element Analysis" launched by Shangtang Zhiying can extract and analyze a variety of elements in the video, such as characters, scenes, props, lines and other information, automatically generate sub-shot scripts with an accuracy of 98%, and extract popular video style elements, which can effectively reduce script writing time and help advertisers save content production costs.

Limited to the maturity of the technology, the video created independently by AI is still unable to realize the 2B terminal directly, but it has already made efforts in the process of assisting commercial creation. On January 31, 2023, Netflix, Japan (rinna) and WIT STUDIO jointly created the first release-level animated film "Dogs and teenagers" assisted by AIGC technology. the animation is more than 3 minutes long and uses AIGC to complete part of the scene rendering, which proves that AI technology has begun to achieve commercial landing in the process of auxiliary video creation, but there is still a long way to go before it is really applied to large-scale projects and commercialized realization.

In addition, manufacturers who land in the vertical field based on self-developed sparse models have multimodal matrices. Take going out and asking questions as an example, create multimodal AIGC product matrices such as text, image, voice, video, digital human, etc., and provide one-stop content generation tools for layout. After going out and asking about the launch of its first commercial AIGC product-dubbing platform "Magic Voice Workshop" in 2020, we have made a comprehensive layout of AI sound, AI writing, AI picture generation, voice and image cloning, digital human video and other AIGC areas, and multi-point blossom will focus on a wide range of business scenes.

Generative AI and 3D Model creation: based on Parametric Modeling, GPT word processing enabling

The 3D modeling of industrial scene requires high AI ability, and the generative design can not be fully supported at this stage. Different from the creation of pictures and videos, 3D models are mainly used in the production of industrial scenes, which requires more rigorous and rational modeling and creative ability. at present, AI tools such as ChatGPT are lack of mathematical and logical capabilities, so the progress of direct modeling of generative AI through text description is relatively slow. On the other hand, the design of large assembly scenes such as aircraft and ship models requires very rigorous processes and parameters, and we think that the support capacity of generative AI design in such large-scale scenarios is limited. At present, we observe that the main landing of AI in the field of 3D CAD and EDA is still "AI Inside" enabling.

Generative Design in 3D CAD: AI Inside Enablement based on Parametric Modeling

Generative design (Generative design) in 3D CAD scenes mainly uses the ability of AI to generate a large number of models to choose from. According to the official website of PTC, the generative design under the 3D model scene is mainly based on the designer's given constraints (including space, material, manufacturing method, cost constraints, etc.) and objectives, and with the help of the ability of AI to quickly generate a target model that meets the needs for designers to choose appropriate models for further design and optimization, so as to significantly improve design efficiency. We observe that the current AI applications in 3D CAD are mainly divided into two categories:

► AI auxiliary parameter optimization: usually used in the improvement process of 3D CAD model, based on the CAE simulation results (such as excessive stress or obvious deformation of some parts), we can generate a large number of potential parameters for the parts to be optimized and select them by adding constraints to other parts, and finally get the optimization results.

► AI realizes sketch generation: for example, the Xdesign module of Catia and Solidworks introduces the AI-aided sketch creation function to get the recommended shape given by the system by given parameters and materials. To some extent, it can help engineers to carry out the underlying geometry, so as to speed up the overall design progress.

3D CAD generative design is based on parametric modeling. In fact, parametric modeling has a long history. In 1987, Pro/E released by PTC Company introduced history-based parametric modeling for the first time. Up to now, mainstream 3D CAD products have parametric modeling functions. Whether it is AI auxiliary parameter optimization or sketch generation, it is essentially based on the given constraints to generate a large number of parameters, and then generate a design scheme for designers to choose based on these parameters. At present, the mainstream 3D CAD products, such as Catia, NX, Pro/E, Solidworks, SolidEdge and so on, all have AI module to realize the auxiliary design function.

AI Inside in EDA: design efficiency optimization based on existing design data

AI enabling is expected to help chip design to achieve real "automation". The current EDA tools, even in the more automated digital chip design process, still require a large number of manual operation scenarios of designers. We believe that the improvement in the degree of automation brought about by AI is expected to reduce repetitive work in the design process and further liberate the productivity of designers. At present, the empowerment of EDA design tools by AI can be divided into two levels: AI Inside and AI Outside: AI Inside generally refers to AI enabling the corresponding design software to make design tools more intelligent and efficient; the corresponding is AI Outside, that is, to enable machines to accumulate experience through learning, so that to a certain extent, it can replace manual work to become a new "productivity".

The backend of chip design (especially the layout and routing) is the main application scenario of AI Inside in EDA. In the digital chip design process, the most important layout and routing link at the back end of the design involves the physical shape and placement of logic devices, and engineers need to consider multiple factors such as grid node, grid granularity, wiring density and so on. Therefore, placement and routing is usually a time-consuming link in data chip design, and the design efficiency is expected to be significantly improved through the image recognition and optimization algorithm of AI. At present, overseas Cadence, Synopsys and other EDA head manufacturers have the ability to design AI Inside enabling chips:

► Cadence: in March 2020, Cadence released an updated version of the digital full-process tool, which integrates the layout and routing tool Innovus and the front-end physical verification Genus tool through iSpatial technology, and integrates machine learning technology. Users can use the existing design data to train iSpatial to minimize the design margin in the layout and routing process.

► Synopsys: DSO.ai, the AI application for EDA, was released by Synopsys in 2020. According to the company's website, Design Space Optimization (DSO) searches large design spaces with the help of machine learning algorithms, which can be used to optimize the input parameters and selection of chip design workflow to meet the exact needs of specific projects.[1]We think that it is essentially similar to the parameter optimization function in 3D CAD model design.

Looking to the future, AI Outside is expected to achieve real "chip design automation" at a higher level. Different from the concept of AI Inside enabling EDA tools, AI Outside pays more attention to the dimensions of tool users, which means that EDA tools achieve the effect of reducing manual intervention and releasing productivity by learning human design patterns and accumulating design experience. At present, both Synopsys and Cadence have explored the realization of design automation with the help of AI Outside. We think that the main resistance to the realization of AI Outside at this stage lies in the cost of data acquisition. The AI Outside training process requires high reliability of chip data, but the chip design company's data is difficult to obtain. We think that EDA company may gradually move towards the goal of AI Outside by relying on the binding relationship with the wafer factory.

The Fusion of generative Design and GPT Model: the potential path from text to Model

Imagination of the fusion of generative design and GPT model: text description parameterization. We believe that large models such as GPT still have a large application space in 3D model design. The potential direction in the future may be to understand the text needs of designers with the help of the word processing ability of ChatGPT, that is, to understand and transform the text description into a series of model parameters, and get the corresponding model design scheme through 3D CAD generative design.

► generative design is an existing technology reserve. At present, the generative design of 3D model has been able to achieve parameter optimization and sketch generation. We think that with the gradual improvement of technology, the step from given parameters to 3D model generation may not be the bottleneck from text to model.

The conversion of ► text to parameters is the biggest difficulty in the process of Vincent model. The current Transformer model is better at natural language processing of the scene. We think that it is difficult to convert the text into the parameters needed by the designer. Breaking through the bottleneck of the text description to the parameter description is expected to pave the way for the realization of the text to the model. In 2021, the Deepmind paper discussed the possibility of connecting graphics and sequences, and realized the CAD sketch generation with the help of the natural language processing ability of Transformer model.

DeepMind uses the natural language processing ability of Transformer model to realize sketching. Sketch design is the skeleton of a 3D model, which defines how the entity maintains its original shape under the parameter transformation through specific constraints. DeepMind published a paper in 2021, discussed the similarity between CAD sketch drawing and natural language modeling, and proposed a machine learning model that can generate CAD sketches automatically, which performs well in unconditional synthesis and image-to-sketch conversion tasks. The highlight of this paper is to realize the correspondence between the pattern and the sequence, so that the Transformer large model can be used to deal with the sequence. We believe that with the gradual deepening of the application of Transformer large model, its integration with CAD may continue to advance, and the application of text-based higher-level model generation may be born in the future.
Risk

Technological progress is not as expected: as a cutting-edge emerging technology, artificial intelligence is still in a period of rapid technological development, and its progress has a certain degree of uncertainty. if the technological progress is not as expected, it may lead to the slow progress of industrialization.

Commercial landing rhythm is not as expected: commercial landing is the key point for artificial intelligence to move smoothly to the next stage. If commercial landing rhythm is not as expected, it will have a negative impact on the progress of artificial intelligence.

Industry competition intensifies: artificial intelligence is a hot spot in the industry, with significant business value in the future. Technology giants and start-ups are all laid out in this field, and the industry competition in vertical category and application layer may be further intensified in the future.

Edit / irisz

The translation is provided by third-party software.

The above content is for informational or educational purposes only and does not constitute any investment advice related to Futu. Although we strive to ensure the truthfulness, accuracy, and originality of all such content, we cannot guarantee it.

ChatGPT兴起，创成式AI能否重塑工具软件底层逻辑？

With the rise of ChatGPT, can generative AI reshape the underlying logic of tool software?

正文

Text

Risk Disclaimer

Statement