Pachyderm
Open-source ML data versioning and pipeline platform for reproducible machine learning workflows
Pachyderm
Open-source ML data versioning and pipeline platform for reproducible machine learning workflows
Pachyderm is an open-source data versioning and ML pipeline platform built on Kubernetes that enables reproducible machine learning by automatically tracking data provenance for every result. Like Git for data, Pachyderm versions datasets and the computations performed on them, so users can understand exactly what data and code produced any given model or output. Data pipeline stages are containerized, making workflows portable and reproducible across environments. Research teams and ML engineers use Pachyderm when data reproducibility, lineage tracking, and audit trails are critical requirements—common in healthcare AI, financial services, and scientific research where models must be explainable and reproducible.
Key Features
- ✓Data versioning
- ✓Pipeline automation
- ✓Data provenance
- ✓Kubernetes-native
- ✓Reproducibility
Quick Info
- Category
- MLOps
- Pricing
- Freemium
More MLOps Tools
Kubeflow
MLOpsOpen-source Kubernetes-native ML platform for deploying scalable machine learning workflows
Tecton
MLOpsEnterprise feature store for productionizing machine learning features and serving them in real time
Hopsworks
MLOpsOpen-source data platform for managing ML features, models, and pipelines end-to-end
Guild AI
MLOpsOpen-source experiment tracking and hyperparameter tuning tool for machine learning workflows