Aphrodite Engine
Production LLM serving engine focused on high concurrency and diverse quantization support
Aphrodite Engine
Production LLM serving engine focused on high concurrency and diverse quantization support
Aphrodite Engine is an open-source LLM serving engine forked from vLLM with additional focus on supporting a wider range of quantization formats (GPTQ, AWQ, EXL2, GGUF, and more) and higher concurrency scenarios. It extends the vLLM paged attention approach with support for exotic model types and community-oriented features requested by the local AI community. Developers hosting LLM APIs, researchers deploying custom model variants, and AI platform builders use Aphrodite as a flexible serving backend that handles more model formats than mainstream alternatives.
Key Features
- ✓Wide quantization support
- ✓High concurrency
- ✓Paged attention
- ✓GGUF support
- ✓OpenAI-compatible API
Quick Info
- Category
- AI Infrastructure
- Pricing
- Free
More AI Infrastructure Tools
Inferless
AI InfrastructureServerless AI model deployment platform with GPU auto-scaling and cold start optimization
Colossal AI
AI InfrastructureOpen-source system for efficient large-scale AI model training and fine-tuning
Neural Magic
AI InfrastructureSoftware-defined AI inference engine that runs LLMs at GPU speed on CPUs
Weaviate Cloud
AI InfrastructureFully managed cloud service for the Weaviate open-source vector database