Meta introduces Voicebox: an all-in-one speech generative AI model

Devdiscourse News Desk | California | Updated: 16-06-2023 22:51 IST | Created: 16-06-2023 22:51 IST

Country:
United States

Video Credit: Meta

Meta has introduced Voicebox, a cutting-edge generative AI model that pushes the boundaries of speech-related tasks via innovative in-context learning. The AI model, as the company claims, can perform speech generation tasks that it was not specifically trained for, such as editing, sampling, and stylizing.

One of the most notable features of Voicebox is its capacity to produce high-quality audio clips as well as the ability to edit pre-recorded audio, removing unwanted elements like car horns or a dog barking, all while preserving the content and style of the original recording.

Furthermore, Voicebox's multilingual capabilities allow it to generate speech in six different languages - English, French, German, Spanish, Polish and Portuguese.

In the future, multipurpose generative AI models like Voicebox could provide natural-sounding voices for virtual assistants and non-player characters in the metaverse, revolutionizing human-machine interactions. For visually impaired individuals, AI-powered voices could read written messages in the familiar voices of friends.

The AI model also offers content creators powerful tools for audio track creation and editing.

Voicebox's flexibility allows it to excel in various tasks, including in-context text-to-speech synthesis. By analyzing audio samples as short as two seconds, the AI model can match the style and use it to generate text-to-speech output. It can seamlessly recreate interrupted speech segments caused by noise or replace misspoken words, eliminating the need for extensive re-recording. It essentially functions as an "eraser" for audio editing.

Further, with its cross-lingual style transfer ability, Voicebox can produce a reading of someone’s speech and a passage of text in any of the supported languages, facilitating natural and authentic communication between individuals who speak different languages.

Lastly, Voicebox's diverse speech sampling is another notable achievement. Having been trained on a wide range of data, the AI model can generate speech that better reflects how people naturally speak in real-world scenarios and across the six supported languages.

Meta introduces Voicebox: an all-in-one speech generative AI model

ALSO READ

Artistic Freedom Under Siege: Satheesan Challenges Meta on Parody Song Takedown

TRENDING

Lucas Alario's Double Seals Estudiantes' Dramatic Trophy Win

WWII Navy veteran Ira 'Ike' Schab, one of last remaining Pearl Harbor surviv...

In Pursuit of Peace: The Great Gamble in Ukraine's Future

Rain Interrupts England's Chase on Final Day

OPINION / BLOG / INTERVIEW

Why Price-Adjusted GDP Shows Asia and the Pacific at the Center of Global Economic Power

Asia’s Missing Exports: How Trade Delays and Policy Gaps Are Holding Back Growth

From Manufacturing Success to High Income: How Malaysia Must Rethink Its Growth Model

Making Nature Bankable: How China Is Unlocking Finance for Ecological Restoration

DevShots

Latest News

Yen's Tumble: BOJ's Rate Hike Stirs Forex Markets

In Memoriam: The Life and Legacy of James Ransone

Najib Razak's Fate Hangs in Balance Amid 1MDB Corruption Scandal

Australia's Natural Gas Reservation Proposal: A New Policy to Secure Domestic Supply

Connect us on

SECTORS

EDITIONS

OTHER LINKS

OTHER PRODUCTS

CONNECT