NVIDIA Triton Inference Server
Open-source AI model serving software for production inference
NVIDIA Triton Inference Server
Open-source AI model serving software for production inference
NVIDIA Triton Inference Server is an open-source software for deploying AI models at production scale. It supports all major ML frameworks (TensorFlow, PyTorch, ONNX, TensorRT) and hardware backends (GPUs, CPUs, custom accelerators), and provides dynamic batching, concurrent model execution, and model ensemble support. Triton is the industry-standard inference server for high-throughput, low-latency AI serving in data centers and cloud environments.
Key Features
- ✓Multi-framework support
- ✓Dynamic batching
- ✓Concurrent model execution
- ✓GPU optimization
- ✓gRPC and HTTP APIs
- ✓Model ensembles
Quick Info
- Category
- Code & Development
- Pricing
- Free
More Code & Development Tools
GitHub Copilot
Code & DevelopmentThe AI pair programmer trusted by millions of developers
Cursor
Code & DevelopmentThe code editor built around AI from the ground up
Tabnine
Code & DevelopmentPrivacy-first AI code completion
Codeium
Code & DevelopmentFree AI coding assistant with no usage limits