🔊

CosyVoice

Alibaba's multilingual TTS model with voice cloning and instruction-following

Voice AI

CosyVoice

Alibaba's multilingual TTS model with voice cloning and instruction-following

Voice AIFree

CosyVoice is Alibaba's open-source multilingual speech synthesis model that supports voice cloning from a few-second reference audio, cross-lingual voice transfer, and instruction-following for controlling speaking style. It produces high-quality speech in Chinese, English, Japanese, Korean, and other languages with consistent voice characteristics. Developers building multilingual AI applications, localization tools, and voice-enabled products use CosyVoice for its zero-shot cloning capability and instruction control over voice characteristics.