Skip to main content
⚙️

Aphrodite Engine

Production LLM serving engine focused on high concurrency and diverse quantization support

AI Infrastructure
Aphrodite Engine logo

Aphrodite Engine

Production LLM serving engine focused on high concurrency and diverse quantization support

Aphrodite Engine is an open-source LLM serving engine forked from vLLM with additional focus on supporting a wider range of quantization formats (GPTQ, AWQ, EXL2, GGUF, and more) and higher concurrency scenarios. It extends the vLLM paged attention approach with support for exotic model types and community-oriented features requested by the local AI community. Developers hosting LLM APIs, researchers deploying custom model variants, and AI platform builders use Aphrodite as a flexible serving backend that handles more model formats than mainstream alternatives.

Key Features

  • Wide quantization support
  • High concurrency
  • Paged attention
  • GGUF support
  • OpenAI-compatible API
#llm-serving#quantization#open-source#inference#vllm-fork

Get Started

Visit Aphrodite Engine
🟢
Free
Completely free to use

Quick Info

Category
AI Infrastructure
Pricing
Free

More AI Infrastructure Tools