LLaVA AI
Open-source multimodal large language model combining vision and language understanding
LLaVA AI
Open-source multimodal large language model combining vision and language understanding
LLaVA (Large Language and Vision Assistant) is an open-source multimodal model that connects a visual encoder with a language model to enable visual instruction following and image-based conversations. Developed at University of Wisconsin-Madison and Microsoft Research, LLaVA models can describe images, answer visual questions, analyze charts, and understand scene content from natural language instructions. LLaVA has become a foundational reference architecture for open-source vision-language models. AI researchers, developers building visual AI applications, and multimodal AI practitioners use LLaVA as a baseline for vision-language tasks.
Key Features
- ✓Vision-language model
- ✓Image Q&A
- ✓Visual instruction following
- ✓Open-source
- ✓Chart understanding
Quick Info
- Category
- AI Infrastructure & MLOps
- Pricing
- Free
More AI Infrastructure & MLOps Tools
Dstack
AI Infrastructure & MLOpsOpen-source cloud-agnostic platform for AI/ML workload orchestration
Tigris Data
AI Infrastructure & MLOpsAI-native object storage with built-in vector search and S3 compatibility
Superlinked
AI Infrastructure & MLOpsVector compute framework that helps ML engineers build retrieval systems by combining multiple data types a…
Qdrant Cloud
AI Infrastructure & MLOpsManaged vector database cloud service offering high-performance similarity search with filtering, payload i…