⚡

Zephyr

HuggingFace's distilled LLM trained with direct preference optimization

Code & Development

Zephyr

HuggingFace's distilled LLM trained with direct preference optimization

Code & DevelopmentFree

Zephyr is a series of language models from HuggingFace trained using Direct Preference Optimization (DPO) and distilled supervised fine-tuning (dSFT) on AI feedback. Zephyr-7B-beta was a breakthrough model demonstrating that DPO training on AI-generated preference data could produce models outperforming much larger instruction-tuned models. HuggingFace releases Zephyr as a research artifact demonstrating alignment training techniques.