Skip to main content
🤗

HuggingFace TGI

HuggingFace's production LLM serving toolkit with continuous batching and streaming

AI Infrastructure & MLOps
HuggingFace TGI logo

HuggingFace TGI

HuggingFace's production LLM serving toolkit with continuous batching and streaming

Text Generation Inference (TGI) is HuggingFace's production-ready toolkit for deploying and serving large language models with features including continuous batching, token streaming, tensor parallelism, and quantization support. It provides a Docker-based deployment solution with optimized kernels for NVIDIA and AMD GPUs, making it straightforward to serve HuggingFace models in production. TGI is the serving engine behind HuggingFace's hosted inference API. ML engineers deploying open-source models, cloud providers, and enterprises hosting their own LLM infrastructure use TGI for production serving.

Key Features

  • Continuous batching
  • Token streaming
  • Multi-GPU support
  • Quantization
  • Docker deployment
#model-serving#inference#open-source#huggingface#production

Get Started

Visit HuggingFace TGI
🟢
Free
Completely free to use

Quick Info

Category
AI Infrastructure & MLOps
Pricing
Free

More AI Infrastructure & MLOps Tools