share_log

谷歌狙击OpenAI 集中火力猛攻AI智能体

Google targets OpenAI, concentrating fire to fiercely attack AI Asia Vets.

cls.cn ·  Dec 12, 2024 10:33

① Gemini 2.0 will be the latest generation model to support Google smart device development; ② The first to launch was the Gemini 2.0 Flash experimental version, which is more powerful than Gemini 1.5 Pro.

“Science and Technology Innovation Board Daily”, December 12 (Editor Song Ziqiao) On December 12, when OpenAI announced that ChatGPT was fully integrated into Apple, Google released a new generation model, Gemini 2.0. It is worth noting that Gemini 2.0 was created specifically for AI agents (AI agents).

Google CEO Sundar Pichai said in an open letter, “Over the past year, we've been investing in developing more 'agency' models that can understand the world around you more deeply, think multiple steps ahead, and perform tasks for you under your supervision. Today, we're excited to welcome a new generation of models — Gemini 2.0, our most powerful model to date. Through new developments in multi-modality — such as native image and audio output — and the use of native tools, we are able to build new AI agents that bring us closer to the vision of a universal AI assistant.”

Demis Hassabis, CEO of Google DeepMind, also said that 2025 will be the era of AI agents, and Gemini 2.0 will be the latest generation model to support our work based on smart devices.

Currently, Gemini 2.0 has not been officially launched, and Google said it has provided it to some developers for closed testing. The experimental version of Gemini 2.0 Flash, which is more powerful than Gemini 1.5 Pro, was first launched. The experimental version is already open on the web. Gemini users can access Gemini 2.0 Flash on PC, and the mobile version will be launched soon.

According to the benchmark results released by Google, whether in terms of multi-modal image and video capabilities, or coding, math, etc., the Flash experimental version of Gemini 2.0 alone almost completely surpassed the Gemini 1.5 Pro in terms of performance, and the response speed was increased by 2 times.

Google concentrates its firepower to attack AI agents

With this update from Google, we can already get a glimpse of the glacier of its AI layout—all for smart bodies.

1. More powerful multi-modal capabilities:

In addition to supporting multi-modal inputs such as images, video, and audio, the Gemini 2.0 Flash experimental version also supports multi-modal output, such as combining natively generated images and text, and controllable multi-language text-to-speech (TTS) audio.

2. More professional AI search:

Google has launched a new smart feature called Deep Research (Deep Research) in Gemini Advanced. This feature combines Google's search expertise with Gemini's advanced reasoning ability to generate research reports around a complex topic, equivalent to a personal research assistant.

3. A variety of smart devices have been updated and launched:

Updated Project Astra, an agent built on Gemini 2.0: Astra's new features include support for multi-language hybrid conversations; the ability to directly call Google Lens and map functions in the Gemini app; improved memory capacity, with up to 10 minutes of in-session memory, more consistent conversations; and with new streaming processing technology and native audio comprehension capabilities, the agent can understand language with a delay close to human conversation. Notably, Astra is Google's forward-looking project for the glasses project. Google mentioned that Project Astra is being ported to more mobile terminals such as glasses.

Publish Project Mariner (Seafarer Project), an agent for browsers: This agent can understand and infer information on the browser screen, including pixels and web elements (such as text, code, and images), and then use this information through a Chrome extension to help you complete tasks.

Release Jules, an AI programming agent created specifically for developers: Jules supports direct integration into GitHub workflows. Users can use natural language to describe problems and directly generate code that can be incorporated into GitHub projects;

Release game intelligence: Can interpret screen images in real time, give suggestions on next steps through user actions on the game screen, or directly communicate with you by voice when playing games.

Google said it will expand Gemini 2.0 to more of its products at the beginning of next year. The previously launched AI Overview will integrate Gemini 2.0 to improve the ability to handle complex problems, including advanced mathematical formulas, multi-modal queries, and programming. Limited testing has already been carried out this week, and it is expected to be rolled out next year and expanded to more countries and languages.

The translation is provided by third-party software.


The above content is for informational or educational purposes only and does not constitute any investment advice related to Futu. Although we strive to ensure the truthfulness, accuracy, and originality of all such content, we cannot guarantee it.
    Write a comment