Tech

Google’s new AI tool DeepMind uses video pixels and text prompts to generate soundtracks

Share on facebook
Share on twitter
Share on linkedin
Share on pinterest
Share on telegram
Share on email
Share on reddit
Share on whatsapp
Share on telegram


Google DeepMind took off the covers of a new AI tool for generating video soundtracks. In addition to using a text prompt to generate audio, the DeepMind tool also takes video content into account.

By combining the two, DeepMind says users can use the tool to create scenes with “a dramatic soundtrack, realistic sound effects, or dialogue that matches the characters and tone of a video.” You can see some of the examples posted on the DeepMind website – and they look very good.

For a video of a car driving through a cyberpunk-style cityscape, Google used the prompt “cars skidding, car engine revving, angelic electronic music” to generate audio. You can see how the skidding sounds match the movement of the car. Other example creates an underwater soundscape using the prompt “jellyfish pulsing under water, sea life, ocean”.

While users can include a text prompt, DeepMind says it’s optional. Users also don’t need to meticulously match generated audio to appropriate scenes. According to DeepMind, the tool can also generate an “unlimited” number of soundtracks for videos, allowing users to create an infinite stream of audio options.

This could help it stand out from other AI tools, like ElevenLabs’ sound effects generator, which uses text prompts to generate audio. It could also make it easier to pair audio with AI-generated video from tools like DeepMind’s Veo and Sora (the latter plans to eventually incorporate audio).

DeepMind says it trained its AI tool on video, audio, and notes containing “detailed sound descriptions and transcripts of spoken dialogue.” This allows the video-to-audio generator to combine audio events with visual scenes.

The tool still has some limitations. For example, DeepMind is trying to improve its ability to sync lip movement with dialogue, as you can see in this video of a claymation family. DeepMind also notes that its video-to-audio system depends on video quality, so anything grainy or distorted “may lead to a noticeable drop in audio quality.”



Source link

Support fearless, independent journalism

We are not owned by a billionaire or shareholders – our readers support us. Donate any amount over $2. BNC Global Media Group is a global news organization that delivers fearless investigative journalism to discerning readers like you! Help us to continue publishing daily.

Support us just once

We accept support of any size, at any time – you name it for $2 or more.

Related

More

1 2 3 6,129

Don't Miss