👁️

LLaVA AI

Open-source multimodal large language model combining vision and language understanding

AI Infrastructure & MLOps

LLaVA AI

Open-source multimodal large language model combining vision and language understanding

AI Infrastructure & MLOpsFree

LLaVA (Large Language and Vision Assistant) is an open-source multimodal model that connects a visual encoder with a language model to enable visual instruction following and image-based conversations. Developed at University of Wisconsin-Madison and Microsoft Research, LLaVA models can describe images, answer visual questions, analyze charts, and understand scene content from natural language instructions. LLaVA has become a foundational reference architecture for open-source vision-language models. AI researchers, developers building visual AI applications, and multimodal AI practitioners use LLaVA as a baseline for vision-language tasks.