🔊

Audiobox

Meta's unified audio generation model for voice and sound effects

Audio & Speech

Audiobox

Meta's unified audio generation model for voice and sound effects

Audio & SpeechFree

Audiobox is Meta's research project for unified audio generation, combining voice synthesis, sound effects, and audio editing in a single AI model. The platform enables users to generate speech from text, create custom sound effects from descriptions, and even perform audio inpainting to remove or replace specific sounds in existing recordings. Built on flow-matching techniques, Audiobox can generate high-quality audio in various styles and voices, including custom voice cloning from short samples. The system supports multiple modalities including text-to-speech, voice conversion, audio style transfer, and conditional generation based on acoustic descriptions. Researchers and developers can use Audiobox for creating podcasts, audiobooks, game audio, and multimedia content. The model demonstrates impressive zero-shot capabilities, able to generate novel audio combinations without specific training. Meta has released research papers and limited API access for academic and commercial applications.