Zephyr: Direct Distillation of LM Alignment
Lewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani,, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, Cl\'ementine, Fourrier, Nathan Habib, Nathan Sarrazin, Omar Sanseviero, Alexander M. Rush,, and Thomas Wolf

TL;DR
Zephyr is a 7B parameter language model that achieves state-of-the-art alignment to user intent through a novel distillation method using preference data, requiring minimal training time and no human annotation.
Contribution
The paper introduces Zephyr, a new method for aligning smaller language models using preference data and direct preference optimization, outperforming larger models.
Findings
Zephyr-7B surpasses Llama2-Chat-70B on chat benchmarks.
The approach requires only a few hours of training without additional sampling.
No human annotation needed for training.
Abstract
We aim to produce a smaller language model that is aligned to user intent. Previous research has shown that applying distilled supervised fine-tuning (dSFT) on larger models significantly improves task accuracy; however, these models are unaligned, i.e. they do not respond well to natural prompts. To distill this property, we experiment with the use of preference data from AI Feedback (AIF). Starting from a dataset of outputs ranked by a teacher model, we apply distilled direct preference optimization (dDPO) to learn a chat model with significantly improved intent alignment. The approach requires only a few hours of training without any additional sampling during fine-tuning. The final result, Zephyr-7B, sets the state-of-the-art on chat benchmarks for 7B parameter models, and requires no human annotation. In particular, results on MT-Bench show that Zephyr-7B surpasses Llama2-Chat-70B,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗Gxl/sdamodel
- 🤗HuggingFaceH4/zephyr-7b-alphamodel· 4.0k dl· ♡ 11204.0k dl♡ 1120
- 🤗HuggingFaceH4/zephyr-7b-betamodel· 136k dl· ♡ 1836136k dl♡ 1836
- 🤗TheBloke/zephyr-7B-beta-GGUFmodel· 2.1k dl· ♡ 2312.1k dl♡ 231
- 🤗TheBloke/zephyr-7B-beta-GPTQmodel· 810 dl· ♡ 58810 dl♡ 58
- 🤗TheBloke/zephyr-7B-beta-AWQmodel· 3.1k dl· ♡ 393.1k dl♡ 39
- 🤗LoneStriker/zephyr-7b-beta-3.0bpw-h6-exl2model· 2 dl2 dl
- 🤗LoneStriker/zephyr-7b-beta-4.0bpw-h6-exl2model· 2 dl2 dl
- 🤗LoneStriker/zephyr-7b-beta-5.0bpw-h6-exl2model· 5 dl· ♡ 25 dl♡ 2
- 🤗LoneStriker/zephyr-7b-beta-6.0bpw-h6-exl2model· 2 dl2 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Speech and dialogue systems · Recommender Systems and Techniques
