AI is reaching a turning point: Will Google Gemini 2.0 mark the beginning of autonomous AI?

Sina Technology · Dec 12 21:08

北京时间12月12日晚间消息，谷歌今日发布了新一代AI大模型Gemini 2.0，标志着向能够独立完成复杂任务的AI系统迈出了雄心勃勃的一步。同时，Gemini 2.0还引入了原生图像生成和多语言音频功能，使得谷歌在日益激烈的AI竞争中与OpenAI和Anthropic展开直接竞争。

新版本的发布正值谷歌首次推出Gemini的一年后，也正处于AI开发的关键时刻。这些新的“代理”AI系统不仅可以响应查询，还可以理解微妙的上下文，提前规划多个步骤，并代表用户采取受监督行动。

谷歌的新AI助手将如何重塑日常数字生活？

在最近的一次新闻发布会上，Gemini产品管理总监Tulsee Doshi概述了该系统的增强功能，同时展示了实时图像生成和多语言对话。Doshi解释说：“Gemini 2.0带来了增强的性能和新的功能，如原生图像和多语言音频生成。它还具有原生智能工具的使用，这意味着它可以直接访问谷歌产品，如搜索，甚至执行代码。”

最初的版本以Gemini 2.0 Flash为中心，这是一个实验版本，谷歌声称其运行速度是其前身的两倍，同时超越了一些强大模型的功能。这代表着一项重大的技术成就，因为之前的速度提升通常是以降低功能为代价的。

走进新一代AI代理

也许最重要的是，谷歌推出了三个基于Gemini 2.0架构的原型AI代理，展示了该公司对AI未来的愿景。Project Astra是一款升级后的通用AI助手，展示了它在访问谷歌工具和维护先前交互的上下文记忆的同时，能够跨多种语言保持复杂对话的能力。

谷歌DeepMind团队产品经理Bibo Xu在现场演示中解释说：“Project Astra现在有长达10分钟的会话记忆，可以记住你过去与它的对话，这样你就可以获得更有用、更个性化的体验。”

该系统可以在各语言之间平稳过渡，并通过谷歌搜索和地图访问实时信息，显示出了以前在消费者AI产品中看不到的整合水平。

企业AI之战愈演愈烈

对于开发人员和企业客户，谷歌推出了Project Mariner和Jules，这两款专门的AI代理旨在自动化复杂的技术任务。作为Chrome扩展程序演示的Project Mariner，在WebVoyager基准测试中实现了令人印象深刻的83.5%的成功率，这比之前的自主Web导航尝试有了显著改进。该WebVoyager基准主要测试代理在端到端、真实世界的Web任务上的性能。

Google Labs产品管理总监Jaclyn Konzelmann表示：“ Project Mariner是一个早期的研究原型，它探索了浏览网页和采取行动的代理能力。当使用WebVoyager基准进行评估时，Project Mariner取得了83.5%的令人印象深刻的成功率。”

定制硅：谷歌AI雄心背后的基础设施

支持这些进步的是Trillium，谷歌的第六代Tensor Processing Unit （TPU），如今已普遍可供云客户使用。定制的AI加速器代表了对计算基础设施的巨大投资，谷歌在单个网络结构中部署了超过10万个Trillium芯片。

谷歌AI工作室和Gemini API团队的产品经理Logan Kilpatrick在新闻发布会上强调了这项基础设施投资的实际影响。Kilpatrick说：“闪存使用量的增长超过了900%，这令人难以置信。你知道，在过去的几个月里，我们已经推出了六个实验模型，现在有数百万开发人员在使用Gemini。”

未来之路：自主AI时代的安全问题和竞争

谷歌向自主代理的转变，可能是自OpenAI发布ChatGPT以来AI领域最重要的战略转折点。虽然竞争对手一直专注于增强大型语言模型的能力，但谷歌认为，未来属于能够主动导航数字环境、并在最少的人为干预下完成复杂任务的AI系统。

这种能够思考、计划和行动的AI代理的愿景，代表着与当前响应式AI助理模式的背离。这是一个有风险的赌注，因为自主系统可能带来更大的安全问题和技术挑战。但如果成功，它可能会重塑竞争格局。谷歌在定制硅和基础设施方面的大规模投资表明，该公司准备在这个新方向上积极竞争。

然而，向更自主的AI系统的过渡引发了新的安全和伦理问题。谷歌强调了其对负责任开发的承诺，包括与值得信赖的用户进行广泛的测试和内置的安全措施。谷歌还逐步推出这些功能的方法，从开发人员访问和值得信赖的测试人员开始，表明了对部署自主AI系统所涉及的潜在风险的认识。

此次Gemini 2.0的发布正值谷歌面临竞争对手日益增加的压力和对AI安全的严格审查的关键时刻。微软和OpenAI今年在AI开发方面取得了重大进展，而Anthropic等其他公司也在企业客户中获得了吸引力。

谷歌Gemini API集团产品经理Shrestha Basu Mallick在新闻发布会上强调：“我们坚信，构建AI的唯一方法是从一开始就负责任。随着我们推进模型和代理，我们将继续优先考虑将安全和责任作为模型开发过程的关键要素。”

随着这些系统在现实世界中采取行动的能力越来越强，它们可以从根本上重塑人们与技术的互动方式。Gemini 2.0的成功不仅可以决定谷歌在AI市场的地位，还可以决定随着行业向更自主的系统发展，AI发展的更广泛轨迹。

一年前，当谷歌推出Gemini的第一个版本时，AI领域主要由聊天机器人主导，这些机器人可以进行聪明的对话，但在现实世界的任务中却举步维艰。现在，随着AI代理开始朝着自主性迈出第一步，该行业正处于另一个转折点。问题不再是AI是否能理解我们，而是我们是否准备好让AI代表我们行事。谷歌正在押注，而且赌注很大。

At 12:00 PM Peking time on December 12, news came that Google today launched the new generation AI large model Gemini 2.0, marking an ambitious step towards an AI system capable of independently completing complex tasks. At the same time, Gemini 2.0 also introduces native image generation and multilingual audio features, allowing Google to compete directly with OpenAI and Anthropic in the increasingly fierce AI competition.

The release of the new version comes exactly one year after Google first introduced Gemini, and it is also taking place at a crucial moment in AI development. These new 'agent' AI systems can not only respond to queries but also understand subtle contexts, plan multiple steps in advance, and take supervised actions on behalf of users.

How will Google's new AI assistant reshape everyday digital life?

In a recent press conference, Gemini product management director Tulsee Doshi outlined the enhanced capabilities of the system while demonstrating real-time image generation and multilingual conversation. Doshi explained, 'Gemini 2.0 brings enhanced performance and new features such as native image and multilingual audio generation. It also involves the use of native intelligent tools, meaning it can directly access Google products like search and even execute code.'

The initial version centered around Gemini 2.0 Flash, which is an experimental version that Google claims runs twice as fast as its predecessor while surpassing some powerful models' capabilities. This represents a significant technological achievement, as previous speed enhancements typically came at the cost of reduced functionality.

Stepping into the new generation AI agents.

Perhaps most importantly, Google has introduced three prototype AI agents based on the Gemini 2.0 architecture, showcasing the company's vision for the future of AI. Project Astra is an upgraded universal AI assistant that demonstrates its ability to maintain complex conversations across multiple languages while accessing Google tools and preserving contextual memory of previous interactions.

Bibo Xu, product manager of Google's DeepMind team, explained during a live demonstration, 'Project Astra now has a conversation memory of up to 10 minutes, allowing it to remember your past conversations with it, which enables you to have a more useful and personalized experience.'

The system can transition smoothly between languages and access real-time information through Google Search and Maps, demonstrating a level of integration previously unseen in Consumer AI products.

The battle for Enterprise AI is intensifying.

For developers and enterprise clients, Google has launched Project Mariner and Jules, two specialized AI agents designed to automate complex technical tasks. Project Mariner, demonstrated as a Chrome extension, achieved an impressive success rate of 83.5% in the WebVoyager benchmark, showing significant improvement over previous attempts at autonomous web navigation. The WebVoyager benchmark primarily tests the agents' performance on end-to-end, real-world web tasks.

Jaclyn Konzelmann, Director of Product Management at Google Labs, stated: "Project Mariner is an early research prototype that explores the agent's ability to browse the web and take action. When evaluated using the WebVoyager benchmark, Project Mariner achieved an impressive success rate of 83.5%."

Custom Silicon: The Infrastructure Behind Google's AI Ambitions.

Supporting these advancements is Trillium, Google's sixth-generation Tensor Processing Unit (TPU), which is now widely available to cloud customers. The custom AI accelerators represent a massive investment in computing infrastructure, with over 0.1 million Trillium chips deployed within a single network architecture.

Logan Kilpatrick, Product Manager for Google AI Studio and the Gemini API team, emphasized the practical impact of this infrastructure investment at the press conference. Kilpatrick said, "The growth in flash usage has exceeded 900%, which is unbelievable. You know, in the past few months, we have launched six experimental models, and now millions of developers are using Gemini."

The road ahead: Security issues and competition in the era of autonomous AI.

Google's shift towards autonomous agents may be the most significant strategic turning point in the AI field since OpenAI released ChatGPT. While competitors have focused on enhancing the capabilities of large language models, Google believes the future belongs to AI systems that can actively navigate the digital environment and complete complex tasks with minimal human intervention.

This vision of AI agents that can think, plan, and act represents a departure from the current responsive AI assistant model. It's a risky bet, as autonomous systems may introduce greater safety concerns and technological challenges. However, if successful, it could reshape the competitive landscape. Google's massive investments in custom silicon and infrastructure indicate that the company is prepared to compete actively in this new direction.

However, the transition to more autonomous AI systems raises new safety and ethical issues. Google has emphasized its commitment to responsible development, which includes extensive testing with trusted users and built-in safety measures. The gradual rollout of these features, starting with access for developers and trusted testers, shows an awareness of the potential risks involved in deploying autonomous AI systems.

The release of Gemini 2.0 comes at a crucial moment when Google faces increasing pressure from competitors and stringent scrutiny regarding AI safety. Microsoft and OpenAI have made significant advancements in AI development this year, while Other companies like Anthropic have also gained traction among enterprise customers.

Shrestha Basu Mallick, the product manager of the Google Gemini API group, emphasized at the press conference: 'We firmly believe the only way to build AI is to do it responsibly from the very beginning. As we advance our models and agents, we will continue to prioritize security and responsibility as key elements of the model development process.'

As these systems gain the ability to act in the real world, they can fundamentally reshape how people interact with technology. The success of Gemini 2.0 can determine not only Google's position in the AI market but also the broader trajectory of AI development as the Industry moves towards more autonomous systems.

A year ago, when Google launched the first version of Gemini, the AI field was primarily dominated by chatbots that could engage in clever conversations but struggled with real-world tasks. Now, as AI agents take their first steps toward autonomy, the Industry is at another turning point. The question is no longer whether AI can understand us, but whether we are ready to let AI act on our behalf. Google is placing its bets, and the stakes are high.

The translation is provided by third-party software.

The above content is for informational or educational purposes only and does not constitute any investment advice related to Futu. Although we strive to ensure the truthfulness, accuracy, and originality of all such content, we cannot guarantee it.

AI迎来转折点：谷歌Gemini 2.0会成为自主AI的开始吗？

AI is reaching a turning point: Will Google Gemini 2.0 mark the beginning of autonomous AI?

Risk Disclaimer

Statement