☀️

Ray Serve

Scalable ML model serving framework built on Ray for Python applications

Code & Development

Ray Serve

Scalable ML model serving framework built on Ray for Python applications

Code & DevelopmentFree

Ray Serve is a scalable, framework-agnostic model serving library built on the Ray distributed computing framework. It enables ML engineers to deploy, scale, and update ML models and Python applications as microservices with production-grade reliability. Ray Serve supports serving multiple models together in inference pipelines, A/B testing between model versions, automatic scaling based on traffic load, and deployment to Kubernetes with KubeRay. Unlike narrow model serving tools, Ray Serve is designed for complex serving patterns including multi-model chains, business logic alongside models, and mixed CPU/GPU deployments. It is used by teams at Shopify, LinkedIn, and Instacart for high-throughput ML serving.