Track the latest AI trends

OpenAI has fully released a human-like ChatGPT voice assistant, capable of speaking over 50 languages including Chinese.

Hard AI · Sep 25 09:08

来源：硬AI

ChatGPT 高级语音模式Advanced Voice本周面向ChatGPT Plus 和 Team 计划的付费用户推出，首先在美国上线，下周向Edu 和 Enterprise 计划的订阅者开放；新语音有自定义指令，新增五种声音，撤走被指山寨“寡姐”的声音，改进口音、提升对话速度和流畅度。

OpenAI首次公开发布四个月后，ChatGPT的类人高级人工智能（AI）语音助手功能终于开始面向广大付费用户。

美东时间9月24日周二，OpenAI宣布，所有付费订阅OpenAI ChatGPT Plus 和 Team 计划的用户都将可以使用新的 ChatGPT 高级语音模式Advanced Voice，该功能将在未来几天内逐步推出，将首先在美国市场上线。下周，该功能将向OpenAI Edu 和 Enterprise 计划的订阅者开放。

这意味着，本周，ChatGPT的Plus 版个人用户和 Teams 版小型企业团队用户都可以启用新的语音功能，只需通过说话、无需手动输入提示词和GPT对话。在App上访问高级语音模式时，用户可以通过弹出窗口知道自己已经进入了高级语音助手，用户会收到App的通知。

OpenAI赋予ChatGPT新语音版两种功能，一是为语音助手存储“自定义指令”的功能，二是记住用户希望语音助手表现何种行为的“记忆”功能，类似于今年4月OpenAI为ChatGPT 文本版本推出的记忆功能。用户可以利用这些功能确保语音模式的个性化，让AI助手根据用户对所有对话的偏好做出响应。

OpenAI本周二推出了五种不同风格的新声音，分别名为Arbor、Maple、Sol、Spruce 和 Vale，加上之前老版语音模式推出的四种声音 Breeze、Juniper、Cove 和 Ember，可选声音达到九种，撤走了被指山寨“寡姐”的声音Sky。OpenAI还提高了部分外语的对话速度、流畅度并改进口音。

OpenAI介绍，先进的语音助手可以用超过50种语言说“对不起，我迟到了”，并在社交媒体的发帖中附上一段视频，演示用户可以要求语音助手表达，因为让奶奶等了很久，向奶奶致歉。视频显示，AI助手首先按照要求总结了用户想要表达的意思，用英语说了一遍，而后，在用户提示AI奶奶只会说普通话之后，AI助手又用标准的普通话说了一遍。

全新的语音功能适用于OpenAI的AI模型GPT-4o，不适用于最近发布的预览模型 o1。

此次新语音功能上线可谓姗姗来迟。华尔街见闻曾提到，今年5月OpenAI就在推出新旗舰模型GPT-4o时演示了语音模式Voice Mode。当时GPT-4o支持的ChatGPT声音听起来像一名美国成年女性，可以即时回应请求。当它听到演示的OpenAI 研究主管 Mark Chen呼气过度时，似乎从中察觉到了他的紧张，然后说他说“Mark，你不是吸尘器”，告诉Chen要放松呼吸。

OpenAI原计划6月末向一小批Plus计划用户推出该语音模式，但6月宣布推迟一个月发布，以便确保该功能安全有效地处理来自数百万用户的请求。当时OpenAI称，计划今年秋季让所有Plus用户都可以访问该功能，确切的时间表取决于是否达到内部对安全性和可靠性的高标准。

7月末，OpenAI对有限的部分付费Plus用户推出高级语音模式下的ChatGPT，称语音模式无法模仿他人的说话方式，且增加了新的过滤器，保证软件能够发现并拒绝某些生成音乐或其他形式受版权保护音频的请求。不过，新的语音模式缺少5月OpenAI展示过的不少功能，比如计算机视觉功能。该功能可让GPT仅通过使用智能手机的摄像头就对用户的舞蹈动作提供语音反馈。

编辑/lambor

The weather is good today The weather is good today.

ChatGPT's Advanced Voice feature is being launched this week for ChatGPT Plus and Team subscribers, first in the USA, with availability to Edu and Enterprise subscribers next week. The new voice includes custom commands, adds five new voices, removes the voice criticized as a knockoff of the character 'Black Widow,' improves accents, and enhances conversation speed and fluency.

After four months of its initial release, OpenAI has finally rolled out the human-like advanced AI voice assistant feature for its paid users.

On Tuesday, September 24th, in US Eastern Time, OpenAI announced that all paying subscribers of OpenAI's ChatGPT Plus and Team plans will have access to the new ChatGPT Advanced Voice feature, which will gradually roll out in the coming days, starting first in the US market. Next week, the feature will be available to subscribers of OpenAI Edu and Enterprise plans.

This week, both individual users of ChatGPT's Plus version and small business team users of the Teams version can activate the new voice feature simply by speaking, without the need for manual input of prompts and GPT conversations. When accessing the Advanced Voice mode in the app, users will be notified through a popup window that they have entered the advanced voice assistant mode.

OpenAI has introduced two new functions to the voice version of ChatGPT: the ability for the voice assistant to store 'custom commands' and a 'memory' that remembers the kind of behavior users expect from the voice assistant, similar to the memory feature introduced for the text version of ChatGPT in April this year. Users can utilize these functions to personalize the voice mode, enabling the AI assistant to respond based on the user's conversation preferences.

OpenAI released five new voice styles this Tuesday, named Arbor, Maple, Sol, Spruce, and Vale, adding to the four voices previously launched in the old voice mode: Breeze, Juniper, Cove, and Ember. With these nine voice options, the controversial voice 'Sky,' criticized as resembling the character 'Black Widow,' has been removed. OpenAI has also enhanced conversation speed, fluency in some foreign languages, and improved accents.

OpenAI explains that an advanced voice assistant can say 'I'm sorry I'm late' in over 50 languages and attach a video clip in social media posts, demonstrating users can request the voice assistant to express apologies to their grandmother for making her wait. The video shows the AI assistant first summarizing the user's intended message, saying it in English as requested, and then, after users prompt that their grandmother only speaks Mandarin, the AI assistant repeats the message in standard Mandarin.

The new voice feature is compatible with OpenAI's AI model GPT-4o, but not with the recently released preview model o1.

The launch of this new voice feature can be said to have come just in time. Huasheng Securities previously mentioned that in May of this year, OpenAI demonstrated the Voice Mode during the launch of the new flagship model GPT-4o. At that time, the voice supported by GPT-4o sounded like an adult American woman from ChatGPT, capable of responding instantly to requests. When it heard OpenAI research director Mark Chen exhale excessively during the demonstration, it seemed to sense his tension and then told Chen, "Mark, you are not a vacuum cleaner," instructing Chen to relax his breathing.

OpenAI originally planned to launch this voice mode to a small number of Plus plan users at the end of June, but announced a one-month delay in June to ensure that the feature handles requests from millions of users safely and effectively. At that time, OpenAI stated that the plan is to make this feature accessible to all Plus users by the fall of this year, with the exact timeline depending on whether it meets internal high standards for safety and reliability.

At the end of July, OpenAI introduced the advanced ChatGPT in high voice mode to a limited number of paid Plus users, stating that the voice mode cannot mimic someone else's speaking style and adding new filters to ensure that the software can detect and reject requests for generating copyrighted music or other forms of protected audio. However, the new voice mode lacks many features that OpenAI demonstrated in May, such as the computer vision feature. This feature allows GPT to provide voice feedback on users' dance moves solely through the smart phone's camera.

Editor/Lambor

The translation is provided by third-party software.

The above content is for informational or educational purposes only and does not constitute any investment advice related to Futu. Although we strive to ensure the truthfulness, accuracy, and originality of all such content, we cannot guarantee it.

Track the latest AI trends

OpenAI全面发布类人ChatGPT语音助手，会说中文等50多种语言

OpenAI has fully released a human-like ChatGPT voice assistant, capable of speaking over 50 languages including Chinese.

Risk Disclaimer

Statement