Nvidia researchers have created a new artificial intelligence (AI) audio generator called Fugatto that they claim can create sounds never heard before.
Fugatto (short for Foundational Generative Audio Transformer Opus 1) was created to be the “Swiss Army knife for sound” and allows users to edit or generate audio with simple text prompts, the semiconductor giant wrote in a blog post on November 25, 2024.
Also Read: Centre approves PAN 2.0 project: What’s new, top benefits and all you need to know
Examples of these prompts can include removing a particular instrument from a song, changing the accent of someone’s voice, and so on.
“We wanted to create a model that understands and generates sound like humans do,” said Rafael Valle, a manager of applied audio research at NVIDIA and one of the dozen-plus people behind Fugatto, as well as an orchestral conductor and composer.
Fugatto’s applications can be diverse. For example, an ad agency could use it to make ads for multiple regions by applying different accents and emotions to voiceovers, online courses can be created with the voice of a family member or friend, video games can use it to create new assets on the fly, and so on.
It can also go as far as making a trumpet bark or a saxophone meow. The limits are only the user’s imagination.
The researchers even found it can handle tasks it was never trained to do, such as generating a high-quality singing voice from a text prompt.
The model uses a technique called ComposableART to combine instructions. For example, a combination of prompts could ask for text spoken with a sad feeling in a French accent.
Also Read: Mahindra hits the EV market with BE 6e and XEV 9e, know all about them
It can also generates sounds that change over time, a feature called temporal interpolation. For instance, it can create the sounds of a rainstorm moving through an area with crescendos of thunder that slowly fade into the distance, also giving users fine-grained control over how the soundscape evolves.
The tool was made by a diverse group of people from around the world, including from India, Brazil, China, Jordan and South Korea. Nvidia claims this made Fugatto’s multi-accent and multilingual capabilities stronger.
Fugatto’s full version uses 2.5 billion parameters was trained on a bank of NVIDIA DGX systems packing 32 NVIDIA H100 Tensor Core GPUs.
However, all of this also comes with some potential issues which go back to the age-old idea of how AI can take over people’s jobs and also lead to copyright issues.
For example, the Australian Association of Voice Actors warned a parliamentary committee that they estimate some 5,000 local voice actors could soon be out of a job if companies go for AI-based replacements.
Apart from this, even the music industry has shown concerns regarding generative AI infringing on copyrighted content. The Recording Industry Association of America for instance, recently put out a lawsuit against AI tools for allegedly replicating their music.
However, there is a more positive side to this as well depending on how it is looked at. Artists can use it to aid their works.
“Sound is my inspiration. It’s what moves me to create music. The idea that I can create entirely new sounds on the fly in the studio is incredible,” said Ido Zmishlany, a multi-platinum producer and songwriter — and cofounder of One Take Audio, a member of the NVIDIA Inception program for startups.
Also Read: Intel gets $7.86 billion chips manufacturing subsidy from the US government