Although Microsoft and Google are currently dominating the headlines on the subject of AI, numerous other companies are also entering the market. This also includes Facebook parent company Meta, which has just announced its entry into the field of AI. According to the blog post, the company is working on a generative AI tool for speech. The tool is called Voicebox and can even perform language tasks “that it wasn’t specifically trained to do through in-context learning.”
According to Meta, some of these tasks include contextual text-to-speech synthesis. Speech editing, noise reduction, cross-language style transfer, and various speech samples are also possible. The company describes these features as follows:
- In-context text-to-speech: Uses audio samples as short as two seconds long to match the audio style and use for text-to-speech generation.
- Speech editing and noise reduction: The tool can recreate a portion of speech that was interrupted by a noise or replace misspoke words without having to rerecord.
- Cross-lingual style transfer: The tool can take a sample of speech and a passage of text to produce a reading of the text in English, French, German, Spanish, Polish, or Portuguese.
- Diverse speech sampling: Uses diverse data to generate speech more representative of how people talk in the six languages mentioned previously.
According to Facebook, Voicebox is part of the group’s own research on generative AI. As for its usefulness, Meta says:
“In the future, multipurpose generative AI models like Voicebox could give natural-sounding voices to virtual assistants and non-player-characters in the metaverse. They could allow visually impaired people to hear written messages from friends read by AI in their voices, give creators new tools to easily create and edit audio tracks for videos, and much more.”
You can find examples of Voicebox functions and a video in Meta’s blog post linked below.