share_log

抢在ChatGPT前发布语言助手!法国实验室演示自带70种情绪AI

Hurry up and post the language assistant before ChatGPT! French lab demonstrates its own AI with 70 kinds of emotions

wallstreetcn ·  02:01

According to Laboratory Kyutai, Moshi is the world's first real-time generative voice AI that can be used by the entire public. It can speak with 70 emotions and styles, and shows Moshi reciting a poem written by himself with a strong French accent. The Moshi model will be launched in the next few weeks, and the model code is free to share.

Author of this article: Li Dan

Source: Hard AI

OpenAI has been surpassed and is still “planted” on voice assistants that have already been publicly demonstrated.

On Wednesday, July 3, local time, Kyutai, a French independent non-profit AI research laboratory, publicly demonstrated an experimental prototype of Moshi, a voice assistant. According to Kyutai, Moshi was developed by the lab's eight-person research team from scratch in six months, and is the world's first real-time generative voice AI that can be used by the entire public.

Moshi has a variety of human-like emotions. Kyutai scientists say the Moshi system is capable of speaking with 70 different emotions and styles. They demonstrated live how Moshi provided advice on climbing Mount Everest, and had Moshi recite a poem of his own writing with a strong French accent.

Kyutai announced that Moshi's interactive demo will be available on its website later Tuesday. Moshi can be tested online for free starting Tuesday. Kyutai is committed to contributing to open research on AI and the development of the entire ecosystem, and the Moshi model's code and weights will soon be shared for free, which is also unprecedented for this type of technology. A representative from Kyutai said the Moshi model and related research will be released in the next few weeks, with no specific dates revealed.

Kyutai believes Moshi has the potential to completely change the way voice is used in the digital world, and says, for example, that its text-to-speech function is excellent in terms of expressing emotions and the interaction between various voices.

Kyutai's CEO Patrick Pérez said Moshi can “talk and think”, adding, “We believe Moshi has tremendous potential to change the way we communicate with machines.”

Researcher Lucas Beyer commented on social media that Kyutai's Moshi is the first real-time audio big language model (LLM). Kyutai's demo, Moshi had almost no delays, and even interrupted the speakers a few times. It's actually a bit anxious to answer quickly. And Moshi is all open source. Although the sound quality is a bit mechanized, it still performs well as a first edition. Overall it's pretty cool.

Beyer pointed out that at the time of the demonstration, some models on Apple MacBook devices rejected false alarms in real time. Perhaps Kyutai was a bit too anxious about security adjustments. However, it just confirmed that the demo was actually live, and maybe even free-to-play, and he loved that.

Some netizens commented that it was interesting to see Japanese words incorporated into Western culture in a cyberpunk way. In Japanese, Kyutai means sphere, and moshi is a colloquial greeting on the phone, so the combination of these two means “hello sphere.”

Moshi is seen as ChatGPT's newest challenger. More and more startups and tech giants, including Anthropic, Cohere, and Google, are launching models to compete with GPT-4, although some industry experts are concerned about the dangers of this emerging technology.

The introduction of Moshi made Kyutai a pioneer in running OpenAI to launch voice assistants. OpenAI previously planned to provide similar functionality on the robot chat tool ChatGPT. Less than two months ago, it officially presented a voice assistant based on the GPT-4 upgraded GPT-4O model to the public.

During the OpenAI presentation in mid-May of this year, the language assistant acted like a real adult, able to listen, speak, and see, and also have mood changes. Most importantly, it can respond to requests almost instantly. Not only can it tell bedtime stories, observe people's mood changes through appearance, but it can also calm people's nervousness like a friend, and even guide the solution of algebraic equations like an experienced math teacher, making some viewers think of the AI virtual assistant in the 2013 movie “Her.”

However, more than a month later, OpenAI also announced that it was delaying the release of the voice assistant due to security concerns. On Tuesday, June 25, OpenAI posted on social media that the delay in launching the ChatGPT voice assistant function was due to the need to ensure that it can handle requests from millions of users safely and effectively, and it will take another month to meet the company's publishing standards.

Some netizens commented on Tuesday that Kyutai's launch of Moshi is basically equivalent to opening the open source of OpenAI's customer products that have not yet been publicly released, and pay tribute to Kyutai.

Unlike OpenAI, which has an “American” capital background supported by Microsoft with $15 billion, Kyutai is dedicated to research general AI. Since its “birth” in November last year, it has received a total of 300 million euros of support, mainly from European industrial capital.

Kyutai is one of the investment targets that French billionaire Xavier Niel said last year would invest 200 million euros in the AI sector. Iliad, a telecommunications group owned by Niel, announced last year that it will invest 100 million euros in the Kyutai project. Another French billionaire, Rodolphe Saadé, CEO of French shipping and logistics giant CMA CGM, also invested 100 million euros in Kyutai. Former Google CEO Eric Schmidt also participated in Kyutai's investment, and the amount was not disclosed.

Niel said on Tuesday that Moshi showed that Europe can be a global player in AI development. Kyutai “All of the products presented today are world-class, and we are excited to launch this product in Europe.”

On the security side, Kyutai's chief scientific officer Hervé Jégou explained that Kyutai will use indexing and watermarking tools to identify and track its AI-generated audio.

The translation is provided by third-party software.


The above content is for informational or educational purposes only and does not constitute any investment advice related to Futu. Although we strive to ensure the truthfulness, accuracy, and originality of all such content, we cannot guarantee it.
    Write a comment