share_log

通往AI下一个“iPhone时刻”的门票?科技巨头聚首语音交互

The ticket to AI's next “iPhone moment”? Tech giants gather to interact with voice

cls.cn ·  May 14 21:13

① OpenAI released GPT-4O, Apple and Meta Explore headsets with cameras, and Siri is also expected to introduce ChatGPT. These main forms of interaction are all inseparable from voice interaction. ② According to OpenAI's Sam Alteman, voice interaction is an important path to future interaction. “The ability to interact in multiple modes is very powerful.”

“Science and Technology Innovation Board Daily”, May 14 — The audio input response time is as short as 232 ms, can sense human emotions, and can chat with users like a real person — last night at OpenAI's press conference, the outstanding performance of GPT-4O, the latest multi-modal large model attracted attention from the outside world. Tech giants have not slowed down their pursuit. AI human-computer interaction, especially voice interaction, may become the focus of a new round of competition.

According to The Information, citing sources familiar with the matter,$Meta Platforms (META.US)$A project called “Camerabuds” (camera headphones) has been set up to explore the manufacture of AI-powered headsets with cameras, hoping that they can recognize objects and translate foreign languages.

Previously, Meta had released a new generation of Ray-Ban smart glasses with built-in multi-modal AI features. When users wear glasses and say “Hey, Meta,” they can summon a virtual assistant who can see and hear everything happening around them, describe objects, translate, and match clothes.

At the same time,$Apple (AAPL.US)$Similar explorations are also being carried out.

Apple is about to reach an agreement with OpenAI, or will introduce a “chatbot” supported by ChatGPT in iOS 18, which is expected to have a disruptive impact on Apple's personal voice assistant Siri. Previously, it was revealed that the company is exploring the development of AirPods with cameras. AI can use images captured by cameras and use multi-modal voice and image artificial intelligence systems to help users track their daily activities, assist people in their daily work, and optimize their daily work.

Whether it's GPT-4O, a headset with a camera, or an “upgraded Siri” planned for ChatGPT, the main form of interaction is inseparable from voice interaction.

In the voice interaction race, OpenAI has temporarily taken the lead with GPT-4O. This is also a step closer for OpenAI to move towards more natural human-computer interaction. According to today's report by Huafu Securities, GPT-4O has laid the foundation for AI voice assistants: low latency, emotion perception, and visual perception. Among them, emotional perception is rich in one-dimensional speech output modes, and visual perception can be adapted to AI mobile phones, AI computers, and AI intelligent hardware.

In an interview a few days ago, Sam Altman was asked what (revolutionary) devices will come after the iPhone. “I think you have to find some really different interactive paradigms to achieve this kind of device technology.” Altman said, “We're going to further improve (voice functionality).”

In his view, voice interaction is an important path to future interaction. “The ability to interact in multiple modes is very strong. For example, you could ask ChatGPT, 'Hey ChatGPT, what am I watching? 'or 'What plant is this?'”

Looking back in the time tunnel of technology, from AlphaGo, which didn't speak at first, to Apple's Siri and ChatGPT voice versions that later “listen and speak with an open mouth,” to today's GPT-4O, AI human-computer interaction is getting closer and closer to human communication.

For the general public, the huge scale of training data, computing power requirements, and parameter stacking promoted by tech giants are all elements that are difficult to intuitively feel. Lower prices, lower application thresholds, and a more natural communication model are the “secrets” that maintain the most realistic feelings and experiences of users, and may also become decisive factors in the AI battle in the future.

Editor/jayden

The translation is provided by third-party software.


The above content is for informational or educational purposes only and does not constitute any investment advice related to Futu. Although we strive to ensure the truthfulness, accuracy, and originality of all such content, we cannot guarantee it.
    Write a comment