⚙️

Aphrodite Engine

Production LLM serving engine focused on high concurrency and diverse quantization support

AI Infrastructure

Aphrodite Engine

Production LLM serving engine focused on high concurrency and diverse quantization support

AI InfrastructureFree

Aphrodite Engine is an open-source LLM serving engine forked from vLLM with additional focus on supporting a wider range of quantization formats (GPTQ, AWQ, EXL2, GGUF, and more) and higher concurrency scenarios. It extends the vLLM paged attention approach with support for exotic model types and community-oriented features requested by the local AI community. Developers hosting LLM APIs, researchers deploying custom model variants, and AI platform builders use Aphrodite as a flexible serving backend that handles more model formats than mainstream alternatives.