Meta introduces Voicebox: an all-in-one speech generative AI model


Devdiscourse News Desk | California | Updated: 16-06-2023 22:51 IST | Created: 16-06-2023 22:51 IST
  • Country:
  • United States

 

Video Credit: Meta

Meta has introduced Voicebox, a cutting-edge generative AI model that pushes the boundaries of speech-related tasks via innovative in-context learning. The AI model, as the company claims, can perform speech generation tasks that it was not specifically trained for, such as editing, sampling, and stylizing.

One of the most notable features of Voicebox is its capacity to produce high-quality audio clips as well as the ability to edit pre-recorded audio, removing unwanted elements like car horns or a dog barking, all while preserving the content and style of the original recording.

Furthermore, Voicebox's multilingual capabilities allow it to generate speech in six different languages - English, French, German, Spanish, Polish and Portuguese.

In the future, multipurpose generative AI models like Voicebox could provide natural-sounding voices for virtual assistants and non-player characters in the metaverse, revolutionizing human-machine interactions. For visually impaired individuals, AI-powered voices could read written messages in the familiar voices of friends.

The AI model also offers content creators powerful tools for audio track creation and editing.

Voicebox's flexibility allows it to excel in various tasks, including in-context text-to-speech synthesis. By analyzing audio samples as short as two seconds, the AI model can match the style and use it to generate text-to-speech output. It can seamlessly recreate interrupted speech segments caused by noise or replace misspoken words, eliminating the need for extensive re-recording. It essentially functions as an "eraser" for audio editing.

Further, with its cross-lingual style transfer ability, Voicebox can produce a reading of someone’s speech and a passage of text in any of the supported languages, facilitating natural and authentic communication between individuals who speak different languages.

Lastly, Voicebox's diverse speech sampling is another notable achievement. Having been trained on a wide range of data, the AI model can generate speech that better reflects how people naturally speak in real-world scenarios and across the six supported languages.

Give Feedback