share_log

AI搜索引擎来了!谷歌放大招,发布最强AI模型,语音功能正面刚OpenAI

The AI search engine is here! Google expands moves and releases the strongest AI model. The voice function is just as good as OpenAI

硬AI ·  May 15 07:03

Gemini 1.5 Pro contextual window with 2 million tokens, is known as the longest window in the world; Gemini has added a live voice chat function to compete with OpenAI's new model GPT-4O; Gemini will be customizable according to user needs; Google's multi-modal AI project Project Astra can answer questions about objects captured by mobile cameras, and Gemini on the Android side adds multi-modal functions.

Although OpenAI took the lead in releasing a major new product demo,$Alphabet-C (GOOG.US)$The latecomers achieved what OpenAI has not been able to do yet. They were the first to launch an artificial intelligence (AI) search engine to defend their position as king in the search field. At the same time, they fought against OpenAI's newly released flagship model GPT-4O, and faced off with an upgraded version of Gemini, the most powerful AI model.

At the annual Google I/O developer conference held on Tuesday, May 24, EST, Google CEO Sundar Pichai said that all of Google's work revolves around Gemini, a generative AI model. “We hope everyone can benefit from what Gemini does.” AI search is just one of the many services Gemini has incorporated into Google as mentioned by Pichai.

Pichai announced that this week, the AI technology summary generation function will be launched on Google Search in the US. It is called AI Overviews, and will soon be launched in more countries and regions.

Through multi-step reasoning, Gemini can replace user research to find better search results. For example, Gemini in Google searches can plan meals for users by summarizing all meals of the day and recipes for all dishes. If users feel that cooking is too troublesome, Google searches can also find places where users can buy the meals they need with the help of Gemini.

With Gemini's help, users' search results pages will also change, such as finding restaurants with live music. It can even make recommendations based on different seasons, such as showing restaurants with rooftops.

Pichai demonstrated live. Using Gemini's powerful features, more related searches can be performed on Google Photos. For example, Gemini can find the license plate number the user wants through a new feature called Ask Photos with Gemini. Gemini will search in the album based on contextual responses and select the photos the user wants to obtain the license plate number taken in the photo.

Many services from Google Workspace, Google's cloud computing productivity and collaboration platform, will combine Gemini, such as using Gemini to search for emails sent by specific senders in Gmail, and find highlights in online web and video conferencing Google Meetings.

Gemini can be used to search users' phones to help them find receipts and arrange pickup windows. Gemini can be used to search for fun activities if users are planning a trip. Pichai said Google is “making AI useful to everyone.”

Google says users will be able to ask questions directly through videos during searches. A Google executive demonstrated how to fix a broken record player with the help of video search. The specific method is to first record a video showing the damage problem, and then ask why the record player is not working properly. Google search allows frame-by-frame searches to answer questions from executives.

Gemini 1.5 Pro context window 2 million tokens, the longest in the world

According to Google, within three months of launching the so-called most powerful AI model in history, Gemini Advanced, more than 100 users have registered.

Starting this Tuesday, Google added a new model member Gemini 1.5 Pro to Gemini Advanced, claiming that it has the longest contextual window in the world's consumer chatbots, with 1 million tokens at the beginning of the window. Gemini 1.5 Pro will be available to Gemini Advanced subscribers in over 150 countries and regions, and supports over 35 languages.

According to Pichai, Gemini 1.5 Pro “provides the longest contextual window of any base model to date.” He explained that Gemini 1.5 Pro will have a contextual window of 2 million tokens, double the current model's 1 million token window.

Gemini's New Voice Chat Feature Live Custom Edition Gemini

Google says it will expand Gemini's multi-modal features this summer, including adding the ability to conduct in-depth two-way conversations using voice, a feature called Live. With Gemini Live, users can talk to Gemini and choose from a variety of natural sounds that it responds to. Users can even speak at their own pace, or interrupt and clarify questions during the answer process, just like in any human conversation.

Some netizens commented that they wanted to know how good Gemini's conversation function would be compared to GPT-4O, the latest flagship model released by OpenAI on Monday.

Google says it will add new travel planning features to Gemini Advanced this summer. Using advanced reasoning that takes into account logistics in terms of time and space, Gemini will be able to create personalized itineraries, saving users working time.

In the coming weeks, Google will be adding new data analysis features to Gemini Advanced. Users simply upload a spreadsheet, and Gemini can analyze data, create charts, and uncover insights faster.

Google will launch a customized version of Gemini called Gem. Gemini Advanced subscribers will soon be able to get a more personalized experience, create a Gemini according to their own needs, and simply describe what users want Gem to do and how they want it to respond, then turn it into a fitness partner, chef helper, coder partner, or creative writing guide.

For example, users can ask Gemini to be my running coach, give me a plan for running every day, and keep me positive, optimistic, and motivated. Gemini will receive these instructions, and users can enhance these traits and create a gem that meets specific needs with just one click.

Project Astra answers mobile phone shooting questions, Android side Gemini adds multi-modal features

Google officially announced the launch of Project Astra, a new multi-modal AI project, which can explain to users what they can shoot with smartphones. In the video shown by Google, just by pointing the phone camera at an object, Gemini can recognize it, such as a red apple, and can also answer questions such as what in the camera can make sounds.

Google says it will soon add multi-mode functionality to the Gemini Nano model. This means that the user's phone can understand the world the way the user understands it through text, images, sounds, and spoken language.

According to Google, the terminal Android mobile version of Gemini Nano will be more helpful and context-aware. This year, Android phone users will be able to drag and drop the generated images into Google Messages and Gmail, and ask questions about YouTube videos and PDF files directly on their phones and get answers.

Google says that later this year, Gemini Nano's assistive feature TalkBack will be enhanced. The image descriptions will be more clear and rich, helping low-vision users and blind users to better indicate their phones through voice feedback.

Editor/Somer

The translation is provided by third-party software.


The above content is for informational or educational purposes only and does not constitute any investment advice related to Futu. Although we strive to ensure the truthfulness, accuracy, and originality of all such content, we cannot guarantee it.
    Write a comment