Nvidia (NVDA.US) has launched a new ai model for generating music and audio, designed to serve those who produce music, films, and video games.
According to Zhichuang Finance APP, Nvidia (NVDA.US) has launched a new ai model for generating music and audio, designed to serve those who produce music, films, and video games.
According to Nvidia, this model is called Fugatto (Foundational Generative Audio Transformer Opus) and can generate or modify music and sound using any text and audio files.
For example, the model can create music clips based on text prompts, remove or add instruments from existing songs, change accents or emotions in voices, and even produce sounds that have never been heard before.
Rafael Valle, Nvidia's audio research manager, conductor, and composer, stated, "We hope to create a model that understands and produces sound as humans do."
Nvidia noted that advertising agencies can use Fugatto to quickly locate existing advertisements across multiple regions and incorporate different accents and emotions into voiceovers. Additionally, video game developers can use the ai model to modify pre-recorded assets in games to adapt to the constantly changing actions of players.
Fugatto can make a trumpet bark like a dog or a saxophone meow. The company added that through fine-tuning and a small amount of singing data, researchers found it can handle tasks that were not previously trained, such as generating high-quality singing from text.
Nvidia stated that the complete version of Fugatto used 2.5 billion parameters and was trained on an Nvidia DGX system containing 32 Nvidia H100 Tensor Core GPUs. The overall work on the model took more than a year.
Fugatto may compete with similar technologies from startups like Runway and large companies like meta platforms (META.US). In October, meta launched an ai model called Movie Gen, which can create realistic video and audio clips based on user prompts.
In February of this year, OpenAI, the maker of ChatGPT, launched Sora, which can create realistic and imaginative scenes based on text instructions. This company, supported by microsoft (MSFT.US), has not yet released a text-to-video model to the public.