At the beginning of 2025, Google will expand Gemini 2.0 to more Google products.
According to Zhituo Finance APP, HAITONG SEC has released a Research Report stating that Gemini 2.0 has been optimized and improved in a series of areas, including native user interface operational capabilities, multimodal reasoning, long-text understanding, complex instruction following and planning capabilities, function combination calls, native tool usage, and latency optimization. This further gives Gemini 2.0 a more prominent advantage in enhancing application capabilities and building AI agents, and the future implementation of AI applications and the development of AI agents are expected to accelerate continuously.
The main points of Haitong Securities are as follows:
The release of Gemini 2.0 shows significant progress in low latency and multimodal capabilities.
On December 12, Google released the first model from the Gemini 2.0 series: the experience version of Gemini 2.0 Flash. This is Google's Block Orders model, featuring low latency characteristics and demonstrating outstanding performance at the forefront of Google's large-scale technology. Compared to 1.5 Flash, Gemini 2.0 Flash has further enhanced performance while maintaining the same fast response time. It is noteworthy that 2.0 Flash even surpassed 1.5 Pro in key benchmark tests, with a speed that is twice that of 1.5 Pro.
At the same time, 2.0 Flash also has new features. In addition to supporting multimodal inputs such as images, videos, and audio, 2.0 Flash can now also support multimodal outputs, for example, it can directly generate content that mixes images and text, and natively generate controllable multilingual text-to-speech (TTS) audio. It can also natively call tools such as Google Search, code execution, and third-party user-defined functions. At the beginning of 2025, Google will expand Gemini 2.0 to more Google products.
Google is highly focused on the field of AI agents, and Gemini 2.0 has become an important support.
The application of AI agents in reality is an exciting and full of potential research field. Gemini 2.0 allows Google to build new AI agents, bringing Google closer to its vision of creating a universal assistant.
Project Astra: An agent that uses multimodal understanding of the real world.
Project Astra is an advanced visual and conversational response agent released by Google in May, designed to help build the future AI assistant. Based on Gemini 2.0, Project Astra has produced several improvements:
Smoother conversations: Project Astra can now converse in multiple languages and mixed languages, and it understands different accents and obscure words better.
Use of new tools: With Gemini 2.0, Project Astra can utilize Google Search, Google Lens, and Google Maps, thereby enhancing its role as an assistant in daily life.
Better memory: Google has enhanced Project Astra's memory capabilities while ensuring that users can control the conversation. It can now remember up to 10 minutes of conversation content and recall more past conversations to provide users with better Personal Services.
Lower latency: With new streaming technology and native audio comprehension capabilities, this agent can understand language with a delay close to that of human conversation.
Project Mariner: An agent that helps users complete complex tasks.
Project Mariner is an early research prototype built using Gemini 2.0, aimed at exploring the future of human-machine interaction starting from the user's browser. As a research prototype, it can understand and reason about information on browser pages, incluindo web elements like pixels and text, code, images, and forms, and then use this information to accomplish tasks for users through an experimental Chrome extension. In the WebVoyager benchmark test (which evaluates the performance of agents on end-to-end real-world web tasks), Project Mariner achieved a work efficiency of 83.5% as a single agent setup, reaching state-of-the-art levels.
Jules: An AI agent for developers.
Next, Google will explore how AI agents can assist developers through Jules (an experimental AI code agent that can be directly integrated into GitHub workflows). It can solve problems, make and execute plans, all under the guidance and supervision of developers. This work is part of Google's long-term goal to build AI agents that can assist in all domains, including coding.
Agents in gaming and other fields.
Google has built agents using Gemini 2.0 that can help users make smarter decisions in video games, analyzing the game situation based on real-time visuals on the screen and providing suggestions for the next course of action. Google is working with top game development teams like Supercell to explore the application of agents in the gaming domain. Google evaluates these agents' ability to understand game rules and respond to challenges by testing their performance in various games. These agents can also provide users with rich game-related knowledge through Google Search. Google is also applying the spatial reasoning capabilities of Gemini 2.0 in the robotics field, attempting to have agents assist in the real world.
Risk reminder: The development of AI technology may fall short of expectations, and the application of AI may not materialize as anticipated.