share_log

英伟达(NVDA.US)推出新AI模型Fugatto,可修改并生成新声音

Nvidia (NVDA.US) has launched a new AI model called Fugatto, which can modify and generate new voices.

Zhitong Finance ·  Nov 26, 2024 11:45

Nvidia (NVDA.US) has launched a new ai model for generating music and audio, designed to serve those who produce music, films, and video games.

According to Zhichuang Finance APP, Nvidia (NVDA.US) has launched a new ai model for generating music and audio, designed to serve those who produce music, films, and video games.

According to Nvidia, this model is called Fugatto (Foundational Generative Audio Transformer Opus) and can generate or modify music and sound using any text and audio files.

For example, the model can create music clips based on text prompts, remove or add instruments from existing songs, change accents or emotions in voices, and even produce sounds that have never been heard before.

Rafael Valle, Nvidia's audio research manager, conductor, and composer, stated, "We hope to create a model that understands and produces sound as humans do."

Nvidia noted that advertising agencies can use Fugatto to quickly locate existing advertisements across multiple regions and incorporate different accents and emotions into voiceovers. Additionally, video game developers can use the ai model to modify pre-recorded assets in games to adapt to the constantly changing actions of players.

Fugatto can make a trumpet bark like a dog or a saxophone meow. The company added that through fine-tuning and a small amount of singing data, researchers found it can handle tasks that were not previously trained, such as generating high-quality singing from text.

Nvidia stated that the complete version of Fugatto used 2.5 billion parameters and was trained on an Nvidia DGX system containing 32 Nvidia H100 Tensor Core GPUs. The overall work on the model took more than a year.

Fugatto may compete with similar technologies from startups like Runway and large companies like meta platforms (META.US). In October, meta launched an ai model called Movie Gen, which can create realistic video and audio clips based on user prompts.

In February of this year, OpenAI, the maker of ChatGPT, launched Sora, which can create realistic and imaginative scenes based on text instructions. This company, supported by microsoft (MSFT.US), has not yet released a text-to-video model to the public.

The translation is provided by third-party software.


The above content is for informational or educational purposes only and does not constitute any investment advice related to Futu. Although we strive to ensure the truthfulness, accuracy, and originality of all such content, we cannot guarantee it.
    Write a comment