Google targets OpenAI, concentrating fire to fiercely attack AI Asia Vets.

cls.cn · Dec 12, 2024 10:33

①Gemini 2.0将是支撑谷歌智能体开发的最新一代模型；②第一时间上线的是比Gemini 1.5 Pro更强的Gemini 2.0 Flash实验版。

《科创板日报》12月12日讯（编辑宋子乔） 12月12日，在OpenAI宣布ChatGPT全面接入苹果之际，谷歌发布新一代大模型Gemini 2.0，值得注意的是，Gemini 2.0专为AI智能体（AI Agent）而生。

谷歌首席执行官Sundar Pichai在公开信中称，“在过去一年中，我们一直在投资开发更具“代理性”的模型，即这些模型能更深入地理解你周围的世界，提前多步思考，并在你的监督下为你执行任务。今天，我们很高兴迎来新一代的模型——Gemini 2.0，它是我们迄今为止最强大的模型。通过多模态的新进展——如原生图像和音频输出——以及原生工具使用，我们能够构建新的AI智能体，使我们更接近普遍AI助手的愿景。”

谷歌DeepMind CEO Demis Hassabis也表示，2025年将是AI智能体的时代，Gemini 2.0将是支撑我们基于智能体工作的最新一代模型。

目前Gemini 2.0版本尚未正式上线，谷歌表示已经将其提供给了一些开发者内测。第一时间上线的是比Gemini 1.5 Pro更强的Gemini 2.0 Flash实验版，实验版已在网页端开放，Gemini用户可以通过PC端访问Gemini 2.0 Flash，移动端即将推出。

根据谷歌发布的基准测试结果，不论是在多模态的图片、视频能力上，还是编码、数学等能力上，仅是Flash实验版的Gemini 2.0表现几乎全面超越Gemini 1.5 Pro，且响应速度提升了2倍。

谷歌集中火力猛攻AI智能体

通过谷歌的本次更新，我们已经可以窥见其AI布局的冰川一角——一切为了智能体。

1、更强大的多模态能力：

Gemini 2.0 Flash实验版除了支持图像、视频和音频等多模态输入，还支持多模态输出，比如原生生成的图像与文本结合，以及可操控的多语言文本转语音（TTS）音频。

2、更专业的AI搜索：

谷歌在Gemini Advanced中推出了一项名为深度研究（Deep Research）的智能体新功能。该功能结合了谷歌的搜索专长和Gemini的高级推理能力，可以围绕一个复杂主题生成研究报告，相当于一个私人研究助手。

3、多款智能体更新、上线：

更新了基于Gemini 2.0构建的智能体Project Astra ：Astra的新功能包括支持多语言混合对话；能够在Gemini应用中直接调用Google Lens和地图功能；记忆能力提升，具备最多10分钟的会话内记忆，对话更连贯；借助新的流式处理技术和原生音频理解能力，该智能体能够以近于人类对话的延迟来理解语言。值得注意的是，Astra是谷歌为眼镜项目所做的前瞻项目。谷歌提到，正在将Project Astra移植到眼镜等更多移动终端中。

发布适用于浏览器的智能体Project Mariner（海员项目）：该智能体能够理解并推理浏览器屏幕上的信息，包括像素和网页元素（如文本、代码和图片），然后通过Chrome扩展程序来利用这些信息帮你完成任务。

发布专为开发者打造的AI编程智能体Jules：Jules支持直接集成到GitHub工作流中，用户使用自然语言描述问题，就能直接生成可以合并到GitHub项目中的代码；

发布游戏智能体：能够实时解读屏幕画面，通过用户游戏屏幕上的动作给出下一步操作建议，或直接在你打游戏的时候通过和你语音交流。

谷歌表示，明年年初，会将Gemini 2.0扩展到更多旗下产品中。此前推出的AI Overviews将集成 Gemini 2.0，从而提升复杂问题处理能力，包括高级数学公式、多模态查询和编程。本周已经进行有限测试，预计明年推广，并扩展至更多国家和语言。

① Gemini 2.0 will be the latest generation model to support Google smart device development; ② The first to launch was the Gemini 2.0 Flash experimental version, which is more powerful than Gemini 1.5 Pro.

“Science and Technology Innovation Board Daily”, December 12 (Editor Song Ziqiao) On December 12, when OpenAI announced that ChatGPT was fully integrated into Apple, Google released a new generation model, Gemini 2.0. It is worth noting that Gemini 2.0 was created specifically for AI agents (AI agents).

Google CEO Sundar Pichai said in an open letter, “Over the past year, we've been investing in developing more 'agency' models that can understand the world around you more deeply, think multiple steps ahead, and perform tasks for you under your supervision. Today, we're excited to welcome a new generation of models — Gemini 2.0, our most powerful model to date. Through new developments in multi-modality — such as native image and audio output — and the use of native tools, we are able to build new AI agents that bring us closer to the vision of a universal AI assistant.”

Demis Hassabis, CEO of Google DeepMind, also said that 2025 will be the era of AI agents, and Gemini 2.0 will be the latest generation model to support our work based on smart devices.

Currently, Gemini 2.0 has not been officially launched, and Google said it has provided it to some developers for closed testing. The experimental version of Gemini 2.0 Flash, which is more powerful than Gemini 1.5 Pro, was first launched. The experimental version is already open on the web. Gemini users can access Gemini 2.0 Flash on PC, and the mobile version will be launched soon.

According to the benchmark results released by Google, whether in terms of multi-modal image and video capabilities, or coding, math, etc., the Flash experimental version of Gemini 2.0 alone almost completely surpassed the Gemini 1.5 Pro in terms of performance, and the response speed was increased by 2 times.

Google concentrates its firepower to attack AI agents

With this update from Google, we can already get a glimpse of the glacier of its AI layout—all for smart bodies.

1. More powerful multi-modal capabilities:

In addition to supporting multi-modal inputs such as images, video, and audio, the Gemini 2.0 Flash experimental version also supports multi-modal output, such as combining natively generated images and text, and controllable multi-language text-to-speech (TTS) audio.

2. More professional AI search:

Google has launched a new smart feature called Deep Research (Deep Research) in Gemini Advanced. This feature combines Google's search expertise with Gemini's advanced reasoning ability to generate research reports around a complex topic, equivalent to a personal research assistant.

3. A variety of smart devices have been updated and launched:

Updated Project Astra, an agent built on Gemini 2.0: Astra's new features include support for multi-language hybrid conversations; the ability to directly call Google Lens and map functions in the Gemini app; improved memory capacity, with up to 10 minutes of in-session memory, more consistent conversations; and with new streaming processing technology and native audio comprehension capabilities, the agent can understand language with a delay close to human conversation. Notably, Astra is Google's forward-looking project for the glasses project. Google mentioned that Project Astra is being ported to more mobile terminals such as glasses.

Publish Project Mariner (Seafarer Project), an agent for browsers: This agent can understand and infer information on the browser screen, including pixels and web elements (such as text, code, and images), and then use this information through a Chrome extension to help you complete tasks.

Release Jules, an AI programming agent created specifically for developers: Jules supports direct integration into GitHub workflows. Users can use natural language to describe problems and directly generate code that can be incorporated into GitHub projects;

Release game intelligence: Can interpret screen images in real time, give suggestions on next steps through user actions on the game screen, or directly communicate with you by voice when playing games.

Google said it will expand Gemini 2.0 to more of its products at the beginning of next year. The previously launched AI Overview will integrate Gemini 2.0 to improve the ability to handle complex problems, including advanced mathematical formulas, multi-modal queries, and programming. Limited testing has already been carried out this week, and it is expected to be rolled out next year and expanded to more countries and languages.

The translation is provided by third-party software.

The above content is for informational or educational purposes only and does not constitute any investment advice related to Futu. Although we strive to ensure the truthfulness, accuracy, and originality of all such content, we cannot guarantee it.

谷歌狙击OpenAI 集中火力猛攻AI智能体

Google targets OpenAI, concentrating fire to fiercely attack AI Asia Vets.

谷歌集中火力猛攻AI智能体

Google concentrates its firepower to attack AI agents

Risk Disclaimer

Statement