Nvidia (NVDA.US) has launched a new AI model called Fugatto, which can modify and generate new voices.

Zhitong Finance · Nov 26, 2024 11:45

英伟达(NVDA.US)推出了一款用于生成音乐和音频的新型人工智能(AI)模型，旨在为制作音乐、电影和视频游戏的人们提供服务。

智通财经APP获悉，英伟达(NVDA.US)推出了一款用于生成音乐和音频的新型人工智能(AI)模型，旨在为制作音乐、电影和视频游戏的人们提供服务。

根据英伟达的说法，这款模型名为Fugatto(Foundational Generative Audio Transformer Opus)，可以使用任何文本和音频文件来生成或修改音乐和声音。

例如，该模型可以根据文本提示创建音乐片段，从现有歌曲中删除或添加乐器，改变声音中的口音或情绪，甚至发出从未听过的声音。

英伟达应用音频研究经理、管弦乐队指挥兼作曲家Rafael Valle表示：“我们希望创建一个能像人类一样理解和产生声音的模型。”

英伟达指出，广告代理商可以使用Fugatto快速定位多个地区的现有广告，并在配音中加入不同的口音和情感。此外，视频游戏开发者可以使用人工智能模型修改游戏中预先录制的资产，以适应用户在玩游戏时不断变化的动作。

Fugatto可以使小号发出狗吠声或萨克斯管发出喵喵声。该公司补充说，通过微调和少量的歌唱数据，研究人员发现它可以处理未经预先训练的任务，比如从文本中生成高质量的歌声。

英伟达表示，Fugatto的完整版本使用了25亿个参数，并在包含32个Nvidia H100 Tensor Core GPU的Nvidia DGX系统上进行了训练。该模型的整体工作耗时一年多。

Fugatto可能会与Runway等初创公司以及 Meta Platforms(META.US)等大公司的类似技术展开竞争。10月，Meta 发布了名为Movie Gen的人工智能模型，该模型可以根据用户提示创建逼真的视频和音频剪辑。

今年 2 月，ChatGPT制造商OpenAI推出了Sora，它可以根据文本指令创建逼真且富有想象力的场景。这家由微软(MSFT.US)支持的公司尚未向公众发布文本转视频模型。

Nvidia (NVDA.US) has launched a new ai model for generating music and audio, designed to serve those who produce music, films, and video games.

According to Zhichuang Finance APP, Nvidia (NVDA.US) has launched a new ai model for generating music and audio, designed to serve those who produce music, films, and video games.

According to Nvidia, this model is called Fugatto (Foundational Generative Audio Transformer Opus) and can generate or modify music and sound using any text and audio files.

For example, the model can create music clips based on text prompts, remove or add instruments from existing songs, change accents or emotions in voices, and even produce sounds that have never been heard before.

Rafael Valle, Nvidia's audio research manager, conductor, and composer, stated, "We hope to create a model that understands and produces sound as humans do."

Nvidia noted that advertising agencies can use Fugatto to quickly locate existing advertisements across multiple regions and incorporate different accents and emotions into voiceovers. Additionally, video game developers can use the ai model to modify pre-recorded assets in games to adapt to the constantly changing actions of players.

Fugatto can make a trumpet bark like a dog or a saxophone meow. The company added that through fine-tuning and a small amount of singing data, researchers found it can handle tasks that were not previously trained, such as generating high-quality singing from text.

Nvidia stated that the complete version of Fugatto used 2.5 billion parameters and was trained on an Nvidia DGX system containing 32 Nvidia H100 Tensor Core GPUs. The overall work on the model took more than a year.

Fugatto may compete with similar technologies from startups like Runway and large companies like meta platforms (META.US). In October, meta launched an ai model called Movie Gen, which can create realistic video and audio clips based on user prompts.

In February of this year, OpenAI, the maker of ChatGPT, launched Sora, which can create realistic and imaginative scenes based on text instructions. This company, supported by microsoft (MSFT.US), has not yet released a text-to-video model to the public.

The translation is provided by third-party software.

The above content is for informational or educational purposes only and does not constitute any investment advice related to Futu. Although we strive to ensure the truthfulness, accuracy, and originality of all such content, we cannot guarantee it.

英伟达(NVDA.US)推出新AI模型Fugatto，可修改并生成新声音

Nvidia (NVDA.US) has launched a new AI model called Fugatto, which can modify and generate new voices.

Risk Disclaimer

Statement