Track the latest AI trends

The AI search engine is here! Google expands moves and releases the strongest AI model. The voice function is just as good as OpenAI

硬AI · May 15 07:03

Gemini 1.5 Pro上下文窗口200万token，号称聊天机器人中的全球最长窗口；Gemini新增语音对话功能Live，与OpenAI新模型GPT-4o一较高下；Gemini将可根据用户需求定制；谷歌的多模态AI项目Project Astra可回答手机摄像头所拍的物体相关问题，安卓端侧Gemini增加多模态功能。

虽然被OpenAI抢在前面发布了重磅新品演示，但$谷歌-C (GOOG.US)$后发制人，做到了OpenAI还没能做到的事，率先发布人工智能（AI）搜索引擎，捍卫搜索领域的王者地位，同时对垒OpenAI新发布的旗舰模型GPT-4o，以升级版的最强大AI模型Gemini迎战。

美东时间5月24日周二举行的年度Google I/O开发者大会上，谷歌CEO Sundar Pichai表示，谷歌所有的工作都围绕生成式AI模型Gemini来做，“我们希望每个人都能从Gemini 所做的事中受益。” AI搜索正是Pichai提到的Gemini融入谷歌多种服务之一。

Pichai宣布，本周，AI技术生成摘要的功能将上线美国的谷歌搜索，它名为AI Overviews，并很快会在更多国家地区推出。

通过多步推理，Gemini可以代替用户研究，找到更好的搜索结果。比如谷歌搜索中的Gemini可以通过汇总一天中的所有膳食以及所有菜肴的食谱，为用户规划膳食。如果用户觉得做饭太麻烦，谷歌搜索还可以在Gemini的帮助下，为用户找到可以购买用户所需餐食的地方。

在Gemini的帮助下，用户的搜索结果页面也会发生变化，比如寻找有现场音乐的餐厅，它甚至可以根据不同的季节做推荐，比如显示有屋顶的餐厅。

Pichai现场演示，借助Gemini 的强大功能，在谷歌相册Google Photos可以进行更多的相关搜索，比如通过名为Ask Photos with Gemini的新功能让Gemini找到用户想要的车牌照号，Gemini将根据上下文响应在相册中搜索，并选出用户想要的照片，得到照片中拍到的车牌照号。

谷歌云计算生产力和协作平台Google Workspace的许多服务将结合Gemini，例如用Gemini 在Gmail中搜索特定发件人发出的电邮，在线上网络和视频会议Google Meeting中找到亮点。

Gemini可用于搜索用户的手机，帮用户查找收据、安排取货窗口。如果用户计划旅行，Gemini 可以用来搜索有趣的活动。Pichai表示，谷歌正在“让AI对每个人都有帮助”。

谷歌称，用户将可以直接在搜索中通过视频提问。谷歌高管演示了，如何借助视频搜索修复损坏的电唱机。具体做法是，先录制视频展示损坏的问题，然后询问，为什么电唱机无法正常工作。谷歌搜索就能够进行逐帧搜索，回答高管的问题。

Gemini 1.5 Pro上下文窗口200万token 全球最长

谷歌称，推出号称有史以来最强大AI模型Gemini Advanced三个月内，已有超过100用户注册。

从本周二起，谷歌在Gemini Advanced中加入新模型成员Gemini 1.5 Pro，称它拥有的上下文窗口在全球消费类聊天机器人中最长，窗口起始就有100万个token。Gemini 1.5 Pro将向150 多个国家地区的Gemini Advanced订阅者提供，支持超过35 种语言。

Pichai称，Gemini 1.5 Pro“提供了迄今为止所有基础模型中最长的上下文窗口。” 他介绍，Gemini 1.5 Pro将拥有200 万个token的上下文窗口，是当前模型100万个token窗口的两倍。

Gemini新语音对话功能Live 定制版Gemini

谷歌称，今年夏季将扩展 Gemini 的多模态功能，包括增加用语音进行深入双向对话的能力，该功能被称为 Live。通过 Gemini Live，用户可以与 Gemini 交谈，并可以从各种自然的声音中选择它回应的声音。用户甚至可以按照自己的节奏说话，或者在回答过程中打断并澄清问题，就像在任何人类对话中一样。

有网友评论称，想知道相比OpenAI周一发布的最新旗舰模型GPT-4o，Gemini的对话功能会有多好。

谷歌称，今年夏季，将在Gemini Advanced 中添加新的旅行规划功能。借助考虑时间和空间方面物流的先进推理，Gemini将能够创建个性化的行程，节省用户的工作时间。

未来几周，谷歌将在Gemini Advanced中添加新的数据分析功能。用户只需上传电子表格，Gemini 就可以分析数据、制作图表，并更快地发掘见解。

谷歌将推出被称为Gem的Gemini的定制版本。Gemini Advanced 订阅者将很快可以获得更个性化的体验，根据自己的需要创建Gemini，只需描述用户希望 Gem 做什么以及希望它如何响应，就可以让它化身健身伙伴、主厨帮手、编代码的拍档或者创意写作指南。

例如，用户可以要求Gemini：做我的跑步教练，给我一个每天跑步的计划，而且保持积极、乐观、激励我。Gemini将接收这些说明，用户只需单击一下，即可强化这方面的特质，创建一个满足特定需求的Gem。

Project Astra回答手机所拍物问题安卓端侧Gemini增加多模态功能

谷歌官宣推出新的多模态AI项目Project Astra，它可以为用户解释智能手机拍到的东西。在谷歌展示的视频中，只要将手机摄像头对准某个物体，Gemini就可以识别它，比如一个红苹果，还可以回答诸如镜头中什么东西是可以发声的这种问题。

谷歌称，将很快为模型Gemini Nano添加多模式功能。这意味着，用户的手机可以通过文本、图像、声音和口语，按照用户理解的方式理解世界。

谷歌称，端侧安卓系统手机版的Gemini Nano将更有帮助，更有上下文的意识。今年，安卓手机的用户将可以将生成的图像拖放到Google Messages 和 Gmail 中，并可以直接在手机上提出有关YouTube视频和 PDF文件的问题，得到解答。

谷歌称，今年晚些时候，Gemini Nano的辅助功能TalkBack将增强。图像描述将更加清晰和丰富，帮助弱视用户和盲人用户通过语音反馈，更好地指示他们的手机。

编辑/Somer

Gemini 1.5 Pro contextual window with 2 million tokens, is known as the longest window in the world; Gemini has added a live voice chat function to compete with OpenAI's new model GPT-4O; Gemini will be customizable according to user needs; Google's multi-modal AI project Project Astra can answer questions about objects captured by mobile cameras, and Gemini on the Android side adds multi-modal functions.

Although OpenAI took the lead in releasing a major new product demo,$Alphabet-C (GOOG.US)$The latecomers achieved what OpenAI has not been able to do yet. They were the first to launch an artificial intelligence (AI) search engine to defend their position as king in the search field. At the same time, they fought against OpenAI's newly released flagship model GPT-4O, and faced off with an upgraded version of Gemini, the most powerful AI model.

At the annual Google I/O developer conference held on Tuesday, May 24, EST, Google CEO Sundar Pichai said that all of Google's work revolves around Gemini, a generative AI model. “We hope everyone can benefit from what Gemini does.” AI search is just one of the many services Gemini has incorporated into Google as mentioned by Pichai.

Pichai announced that this week, the AI technology summary generation function will be launched on Google Search in the US. It is called AI Overviews, and will soon be launched in more countries and regions.

Through multi-step reasoning, Gemini can replace user research to find better search results. For example, Gemini in Google searches can plan meals for users by summarizing all meals of the day and recipes for all dishes. If users feel that cooking is too troublesome, Google searches can also find places where users can buy the meals they need with the help of Gemini.

With Gemini's help, users' search results pages will also change, such as finding restaurants with live music. It can even make recommendations based on different seasons, such as showing restaurants with rooftops.

Pichai demonstrated live. Using Gemini's powerful features, more related searches can be performed on Google Photos. For example, Gemini can find the license plate number the user wants through a new feature called Ask Photos with Gemini. Gemini will search in the album based on contextual responses and select the photos the user wants to obtain the license plate number taken in the photo.

Many services from Google Workspace, Google's cloud computing productivity and collaboration platform, will combine Gemini, such as using Gemini to search for emails sent by specific senders in Gmail, and find highlights in online web and video conferencing Google Meetings.

Gemini can be used to search users' phones to help them find receipts and arrange pickup windows. Gemini can be used to search for fun activities if users are planning a trip. Pichai said Google is “making AI useful to everyone.”

Google says users will be able to ask questions directly through videos during searches. A Google executive demonstrated how to fix a broken record player with the help of video search. The specific method is to first record a video showing the damage problem, and then ask why the record player is not working properly. Google search allows frame-by-frame searches to answer questions from executives.

Gemini 1.5 Pro context window 2 million tokens, the longest in the world

According to Google, within three months of launching the so-called most powerful AI model in history, Gemini Advanced, more than 100 users have registered.

Starting this Tuesday, Google added a new model member Gemini 1.5 Pro to Gemini Advanced, claiming that it has the longest contextual window in the world's consumer chatbots, with 1 million tokens at the beginning of the window. Gemini 1.5 Pro will be available to Gemini Advanced subscribers in over 150 countries and regions, and supports over 35 languages.

According to Pichai, Gemini 1.5 Pro “provides the longest contextual window of any base model to date.” He explained that Gemini 1.5 Pro will have a contextual window of 2 million tokens, double the current model's 1 million token window.

Gemini's New Voice Chat Feature Live Custom Edition Gemini

Google says it will expand Gemini's multi-modal features this summer, including adding the ability to conduct in-depth two-way conversations using voice, a feature called Live. With Gemini Live, users can talk to Gemini and choose from a variety of natural sounds that it responds to. Users can even speak at their own pace, or interrupt and clarify questions during the answer process, just like in any human conversation.

Some netizens commented that they wanted to know how good Gemini's conversation function would be compared to GPT-4O, the latest flagship model released by OpenAI on Monday.

Google says it will add new travel planning features to Gemini Advanced this summer. Using advanced reasoning that takes into account logistics in terms of time and space, Gemini will be able to create personalized itineraries, saving users working time.

In the coming weeks, Google will be adding new data analysis features to Gemini Advanced. Users simply upload a spreadsheet, and Gemini can analyze data, create charts, and uncover insights faster.

Google will launch a customized version of Gemini called Gem. Gemini Advanced subscribers will soon be able to get a more personalized experience, create a Gemini according to their own needs, and simply describe what users want Gem to do and how they want it to respond, then turn it into a fitness partner, chef helper, coder partner, or creative writing guide.

For example, users can ask Gemini to be my running coach, give me a plan for running every day, and keep me positive, optimistic, and motivated. Gemini will receive these instructions, and users can enhance these traits and create a gem that meets specific needs with just one click.

Project Astra answers mobile phone shooting questions, Android side Gemini adds multi-modal features

Google officially announced the launch of Project Astra, a new multi-modal AI project, which can explain to users what they can shoot with smartphones. In the video shown by Google, just by pointing the phone camera at an object, Gemini can recognize it, such as a red apple, and can also answer questions such as what in the camera can make sounds.

Google says it will soon add multi-mode functionality to the Gemini Nano model. This means that the user's phone can understand the world the way the user understands it through text, images, sounds, and spoken language.

According to Google, the terminal Android mobile version of Gemini Nano will be more helpful and context-aware. This year, Android phone users will be able to drag and drop the generated images into Google Messages and Gmail, and ask questions about YouTube videos and PDF files directly on their phones and get answers.

Google says that later this year, Gemini Nano's assistive feature TalkBack will be enhanced. The image descriptions will be more clear and rich, helping low-vision users and blind users to better indicate their phones through voice feedback.

Editor/Somer

The translation is provided by third-party software.

The above content is for informational or educational purposes only and does not constitute any investment advice related to Futu. Although we strive to ensure the truthfulness, accuracy, and originality of all such content, we cannot guarantee it.

Track the latest AI trends

AI搜索引擎来了！谷歌放大招，发布最强AI模型，语音功能正面刚OpenAI