One of the highlights of the Google and OpenAI product duel: Can AI assistants become killer apps?

cls.cn · May 15 16:17

①OpenAI的GPT-4o、以及谷歌的Astra的接连发布表明，科技公司都非常重视人工智能助手的研发； ②从目前用例来看，它们还不能够说是日常生活的必备产品，不过未来成长很有可能使其成为一款“杀手级应用”

财联社5月15日讯（编辑周子意）本周，人工智能领域的头条新闻无疑就是OpenAI和谷歌的产品大对决。

OpenAI公司一贯“喜爱”在竞争对手的重大产品发布会之前抢先发布自己的产品，从而抢占新闻焦点，本周也不例外。

OpenAI在此前就给予了公众很高的期望值，周一（5月13日），该公司如期宣布了GPT-4的升级版，名为GPT-4o（“o”代表omni全方位）。GPT-4o旨在充当手机或平板电脑上的个人助理，具有改进的语音交互功能，能够解释和推理设备相机拍摄的照片，拥有更强大的语言翻译能力，以及更快的响应时间。

GPT-4o背后的技术创新令人印象深刻，该模型是多模态的，它可以实时对音频、视觉和文本进行接收、推理，并生成文本、音频和图像的任意组合输出。该模型与过往版本比较，省去了将用户的声音转化为文本并处理的步骤，意味着整个流程更加快速。

GPT-4o还缩短了模型处理特定数量token所需的时间（在英语文本的情况下，一个token通常等于一个半单词），这也使得该模型比OpenAI此前最佳型号GPT-4 Turbo运行得更快、更便宜。

周二（5月14日），谷歌也连放大招，正面硬刚OpenAI。

在谷歌的I/O开发者大会上，谷歌宣布了一系列新的人工智能功能和即将发布的产品，包括Gemini模型的广泛升级、未来的人工智能助手“Astra”、生成式人工智能赋能谷歌搜索、以及一系列与图像、音乐、视频有关的生成式AI工具。

谷歌在会上公布了Gemini 1.5 Pro模型的改进，将100万tokens的上下文窗口进一步扩大至200万，并且使其能够拥有更自然的声音，更好地理解音频和图像，更强的逻辑推理和规划能力，以及更好的计算机代码生成能力。

并且，谷歌还发布了一款高级视觉和对话响应智能体项目Astra，用于处理音频、视频等多模态的输入内容。相较于OpenAI的GPT-4o只能处理静态图像，Astra还可以处理视频。在一段演示视频中，它能够通过摄像头视频，识别“什么东西能发出声音”、“现在身处何地”等指令。不过它的回应存在滞后或延迟，据悉，谷歌未来版本的人工智能个人助理正在通过“Astra”进行开发。

人工智能助手的“高光时刻”

从OpenAI和谷歌的产品发布可以看出，科技公司都非常重视人工智能助手的研发，并且，“首个人工智能杀手级应用”的位置已成为硅谷各家的“必争之地”。

从本周的产品发布情况来看，OpenAI和谷歌的人工智能助手各有优势。GPT-4o可以直接接收并生成语音，省去了将语音转化为文本的过程；而Astra则可以处理视频此类的动态图像，这是一个显著的优势。

这两个产品的发布显然让硅谷另两家巨头苹果和亚马逊处于不利地位。他们需要升级语音助手Siri和Alexa，以跟上这些新的竞争对手的能力，否则这些产品将陷入困境。就目前已知信息，亚马逊投资的Anthropic拥有强大的Claude AI模型可供使用；此前也有报道传出，苹果正在与OpenAI谈判，以在短期内获得其技术许可。

不过，这些新的人工智能助手就会是未来的“人工智能杀手级应用”吗？这个结论目前还没有定论，完全取决于接下来会发生什么。

就从目前的人工智能助手的用例来看，它们还称不上是人类日常生活中无处不在的必备产品，除了翻译功能以外，几乎没有一个是能够关于帮助人们完成工作的。

有分析指出，当这些助手拥有更多的“代理”属性时，这种情况可能会改变。若有朝一日，它们能够真正了解人类的个人偏好，按照人们的喜好完成任务，并且可以在日常生活中帮忙处理一些事情（例如在线购物、填写保险表格、预订假期等）时，这时的人工智能助手就很有可能成为一款“杀手级应用”。

谷歌目前表示正在开发此类产品，但没有给出产品发布的时间表；OpenAI也继续透露“即将”发布激动人心的未来公告；下周，微软将召开Build开发者大会。

① The successive releases of OpenAI's GPT-4O and Google's Astra show that technology companies all attach great importance to the development of artificial intelligence assistants; ② Judging from current use cases, they cannot be said to be essential products for everyday life, but future growth is likely to make them a “killer app”

Financial Services Association, May 15 (Editor Zhou Ziyi) This week, the headlines in the field of artificial intelligence are undoubtedly the product duel between OpenAI and Google.

OpenAI has always “loved” releasing its products before competitors' major product launches to seize the spotlight, and this week was no exception.

OpenAI had previously given high expectations to the public. On Monday (May 13), the company announced an upgraded version of GPT-4 as scheduled, called GPT-4o (“o” represents omni omni-omnidirectional). GPT-4o is designed to act as a personal assistant on a phone or tablet, with improved voice interaction capabilities, the ability to interpret and reason pictures taken by the device's camera, more powerful language translation capabilities, and faster response times.

The technical innovations behind GPT-4o are impressive. The model is multi-modal, and it can receive audio, vision, and text in real time, and generate any combination of text, audio, and image outputs. Compared to previous versions, this model eliminates the steps of converting the user's voice into text and processing it, which means the entire process is much faster.

GPT-4O also shortens the time it takes for the model to process a specific number of tokens (in the case of English text, one token is usually equal to one and a half words), which also makes the model run faster and cheaper than OpenAI's previous best model GPT-4 Turbo.

On Tuesday (5/14), Google also stepped up its moves and toughened OpenAI head-on.

At Google's I/O developer conference, Google announced a range of new artificial intelligence features and upcoming products, including extensive upgrades to the Gemini model, the future artificial intelligence assistant “Astra”, generative artificial intelligence to enable Google search, and a series of generative AI tools related to images, music, and video.

At the conference, Google announced improvements to the Gemini 1.5 Pro model, further expanding the context window of 1 million tokens to 2 million, and making it possible to have more natural sound, better understanding audio and images, stronger logical reasoning and planning capabilities, and better computer code generation capabilities.

Furthermore, Google also released Astra, an advanced visual and conversational response agent project to process multi-modal input content such as audio and video. Compared to OpenAI's GPT-4O, which can only process still images, Astra can also process video. In a demo video, it can recognize instructions such as “what makes a sound” and “where you are now” through camera video. However, there are delays or delays in its response. It is reported that future versions of Google's artificial intelligence personal assistants are being developed through “Astra.”

“Highlight Moments” for AI Assistants

As can be seen from the product launches of OpenAI and Google, technology companies all attach great importance to the development of artificial intelligence assistants, and the position of “the first killer application of artificial intelligence” has become a “must-compete place” for all companies in Silicon Valley.

Judging from this week's product launches, OpenAI and Google's artificial intelligence assistants each have advantages. GPT-4o can directly receive and generate speech, eliminating the process of converting speech into text; Astra can process moving images such as video, which is a significant advantage.

The launch of these two products clearly puts the other Silicon Valley giants Apple and Amazon at a disadvantage. They need to upgrade their voice assistants Siri and Alexa to keep up with the capabilities of these new competitors, otherwise these products will be in trouble. As far as is known, Anthropic, which Amazon invests in, has a powerful Claude AI model that can be used; previous reports also surfaced that Apple is in negotiations with OpenAI to obtain a license for its technology in the short term.

But will these new AI assistants be future “AI killer apps”? This conclusion is still inconclusive and depends entirely on what happens next.

Judging from the current use cases of artificial intelligence assistants, they cannot be called essential products ubiquitous in human daily life. Other than the translation function, almost none of them can help people complete their work.

Some analysts suggest that this may change when these assistants have more “proxy” properties. If one day they can truly understand human preferences, complete tasks according to people's preferences, and help with some things in everyday life (such as online shopping, filling out insurance forms, booking vacations, etc.), artificial intelligence assistants are likely to become a “killer app.”

Google says it is currently developing such a product, but has not given a timeline for product launch; OpenAI also continues to reveal that it is “about to release” exciting future announcements; next week, Microsoft will hold a Build developer conference.

The translation is provided by third-party software.

The above content is for informational or educational purposes only and does not constitute any investment advice related to Futu. Although we strive to ensure the truthfulness, accuracy, and originality of all such content, we cannot guarantee it.

谷歌、OpenAI产品对决一大看点：AI助手能否成为杀手级应用？

One of the highlights of the Google and OpenAI product duel: Can AI assistants become killer apps?

Risk Disclaimer

Statement