HAITONG SEC: The release of Gemini 2.0 drives Google into the "Asia Vets era."

2025年初，谷歌还会将Gemini 2.0扩展到更多Google产品中。

智通财经APP获悉，海通证券发布研报称，Gemini 2.0在原生用户界面操作能力、多模态推理、长文本理解、复杂指令跟随和规划能力、组合函数的调用，原生工具使用以及延迟优化等一系列领域进行了优化改进，这更使得Gemini 2.0在增强应用能力和构建AI智能体方面拥有了更突出的优势，未来AI应用落地和AI智能体发展也有望持续加速。

海通证券主要观点如下：

Gemini 2.0发布，在低延迟、多模态等方面进步明显。

12月12日，谷歌发布 Gemini 2.0系列模型中的第一个模型：Gemini 2.0 Flash的体验版。这是谷歌的主力模型，具有低延迟特性，而且在谷歌大规模技术前沿中展现了卓越的性能。与1.5 Flash相比，Gemini 2.0 Flash在同样快速的响应时间下性能进一步增强。值得一提的是，2.0 Flash在关键基准测试中甚至超越了1.5 Pro，其速度是1.5 Pro的两倍。

同时2.0 Flash还具有新功能，除了能够支持图片、视频和音频等多模态输入，2.0 Flash现在还可以支持多模态输出，例如可以直接生成图像与文本混合的内容，以及原生生成可控的多语言文本转语音(TTS)音频。它还可以原生调用Google Search、代码执行以及第三方用户定义的函数等工具。2025年初，谷歌还会将Gemini 2.0扩展到更多Google产品中。

谷歌高度关注AI智能体领域，Gemini 2.0成为重要助力。

AI智能体在现实中的应用是一个令人振奋且充满可能性的研究领域。Gemini 2.0使谷歌能够构建新的AI智能体，从而让谷歌离构建通用助手的愿景更进一步。

Project Astra：使用多模态理解现实世界的智能体。

Project Astra是谷歌于5月发布的高级视觉和对话响应智能体，其用于构建构建未来的AI助理。基于Gemini 2.0，Project Astra产生了多项改进：

更流畅的对话：Project Astra现在可以在多种语言和混合语言之间进行对话，并且能够更好地理解不同口音和生僻单词。

新工具的使用：借助Gemini 2.0，Project Astra可以使用Google Search、Google Lens和 Google Maps，从而在日常生活中更好地发挥助手作用。

更强的记忆力：谷歌增强了Project Astra的记忆能力，同时确保用户可以掌控对话。现在，它最多可以记住长达10分钟的会话内容，并且可以回忆起过去与它进行的更多对话，以便为用户提供更好的个性化服务。

更低的延迟：借助新的流式处理技术和原生音频理解能力，该智能体能够以近于人类对话的延迟来理解语言。

Project Mariner：帮用户完成复杂任务的智能体。

Project Mariner是使用Gemini 2.0构建的早期研究原型，旨在从用户的浏览器开始，探索人机交互的未来。作为研究原型，它能够理解和推理浏览器页面中的信息，包括像素和文本、代码、图像和表单等网页元素，然后通过实验性的Chrome扩展程序使用这些信息为用户完成任务。在 WebVoyager 基准测试(该测试针对智能体在端到端的真实世界网页任务的性能)中，Project Mariner作为单个智能体设置实现了83.5%的工作效率，达到了最先进的水平。

Jules：面向开发者的智能体。

接下来，谷歌会探索人工智能智能体如何通过Jules(一种实验性的AI代码智能体，它可以直接集成到GitHub工作流程中)来协助开发者。它可以解决问题、制定并执行计划，所有这些都在开发者指导和监督下进行。这项工作是谷歌长期目标的一部分，即构建可在所有领域(包括编码)中提供帮助的AI智能体。

游戏和其他领域的智能体。

谷歌使用Gemini 2.0构建了智能体，它们可以帮助用户在电子游戏中做出更明智的决策，可以根据屏幕上的实时画面，分析游戏情况，并为用户提供下一步行动建议。谷歌正与Supercell等顶尖游戏开发团队合作，探索智能体在游戏领域的应用。谷歌通过测试它们在各种游戏中的表现，来评估它们理解游戏规则、应对挑战的能力。这些智能体还可以通过Google Search让用户接触到丰富的游戏相关的知识。谷歌还将Gemini 2.0的空间推理能力应用于机器人领域，尝试让智能体在现实世界中提供帮助。

风险提示：AI技术发展不及预期，AI应用落地不及预期。

At the beginning of 2025, Google will expand Gemini 2.0 to more Google products.

According to Zhituo Finance APP, HAITONG SEC has released a Research Report stating that Gemini 2.0 has been optimized and improved in a series of areas, including native user interface operational capabilities, multimodal reasoning, long-text understanding, complex instruction following and planning capabilities, function combination calls, native tool usage, and latency optimization. This further gives Gemini 2.0 a more prominent advantage in enhancing application capabilities and building AI agents, and the future implementation of AI applications and the development of AI agents are expected to accelerate continuously.

The main points of Haitong Securities are as follows:

The release of Gemini 2.0 shows significant progress in low latency and multimodal capabilities.

On December 12, Google released the first model from the Gemini 2.0 series: the experience version of Gemini 2.0 Flash. This is Google's Block Orders model, featuring low latency characteristics and demonstrating outstanding performance at the forefront of Google's large-scale technology. Compared to 1.5 Flash, Gemini 2.0 Flash has further enhanced performance while maintaining the same fast response time. It is noteworthy that 2.0 Flash even surpassed 1.5 Pro in key benchmark tests, with a speed that is twice that of 1.5 Pro.

At the same time, 2.0 Flash also has new features. In addition to supporting multimodal inputs such as images, videos, and audio, 2.0 Flash can now also support multimodal outputs, for example, it can directly generate content that mixes images and text, and natively generate controllable multilingual text-to-speech (TTS) audio. It can also natively call tools such as Google Search, code execution, and third-party user-defined functions. At the beginning of 2025, Google will expand Gemini 2.0 to more Google products.

Google is highly focused on the field of AI agents, and Gemini 2.0 has become an important support.

The application of AI agents in reality is an exciting and full of potential research field. Gemini 2.0 allows Google to build new AI agents, bringing Google closer to its vision of creating a universal assistant.

Project Astra: An agent that uses multimodal understanding of the real world.

Project Astra is an advanced visual and conversational response agent released by Google in May, designed to help build the future AI assistant. Based on Gemini 2.0, Project Astra has produced several improvements:

Smoother conversations: Project Astra can now converse in multiple languages and mixed languages, and it understands different accents and obscure words better.

Use of new tools: With Gemini 2.0, Project Astra can utilize Google Search, Google Lens, and Google Maps, thereby enhancing its role as an assistant in daily life.

Better memory: Google has enhanced Project Astra's memory capabilities while ensuring that users can control the conversation. It can now remember up to 10 minutes of conversation content and recall more past conversations to provide users with better Personal Services.

Lower latency: With new streaming technology and native audio comprehension capabilities, this agent can understand language with a delay close to that of human conversation.

Project Mariner: An agent that helps users complete complex tasks.

Project Mariner is an early research prototype built using Gemini 2.0, aimed at exploring the future of human-machine interaction starting from the user's browser. As a research prototype, it can understand and reason about information on browser pages, incluindo web elements like pixels and text, code, images, and forms, and then use this information to accomplish tasks for users through an experimental Chrome extension. In the WebVoyager benchmark test (which evaluates the performance of agents on end-to-end real-world web tasks), Project Mariner achieved a work efficiency of 83.5% as a single agent setup, reaching state-of-the-art levels.

Jules: An AI agent for developers.

Next, Google will explore how AI agents can assist developers through Jules (an experimental AI code agent that can be directly integrated into GitHub workflows). It can solve problems, make and execute plans, all under the guidance and supervision of developers. This work is part of Google's long-term goal to build AI agents that can assist in all domains, including coding.

Agents in gaming and other fields.

Google has built agents using Gemini 2.0 that can help users make smarter decisions in video games, analyzing the game situation based on real-time visuals on the screen and providing suggestions for the next course of action. Google is working with top game development teams like Supercell to explore the application of agents in the gaming domain. Google evaluates these agents' ability to understand game rules and respond to challenges by testing their performance in various games. These agents can also provide users with rich game-related knowledge through Google Search. Google is also applying the spatial reasoning capabilities of Gemini 2.0 in the robotics field, attempting to have agents assist in the real world.

Risk reminder: The development of AI technology may fall short of expectations, and the application of AI may not materialize as anticipated.

The translation is provided by third-party software.

The above content is for informational or educational purposes only and does not constitute any investment advice related to Futu. Although we strive to ensure the truthfulness, accuracy, and originality of all such content, we cannot guarantee it.

海通证券：Gemini 2.0发布 带动谷歌进入“智能体时代”

HAITONG SEC: The release of Gemini 2.0 drives Google into the "Asia Vets era."

Risk Disclaimer

Statement

海通证券：Gemini 2.0发布带动谷歌进入“智能体时代”