🤗

HuggingFace TGI

HuggingFace's production LLM serving toolkit with continuous batching and streaming

AI Infrastructure & MLOps

HuggingFace TGI

HuggingFace's production LLM serving toolkit with continuous batching and streaming

AI Infrastructure & MLOpsFree

Text Generation Inference (TGI) is HuggingFace's production-ready toolkit for deploying and serving large language models with features including continuous batching, token streaming, tensor parallelism, and quantization support. It provides a Docker-based deployment solution with optimized kernels for NVIDIA and AMD GPUs, making it straightforward to serve HuggingFace models in production. TGI is the serving engine behind HuggingFace's hosted inference API. ML engineers deploying open-source models, cloud providers, and enterprises hosting their own LLM infrastructure use TGI for production serving.