Capturing Nuanced Preferences: Preference-Aligned Distillation for Small Language Models
Yanggan Gu, Junzhuo Li, Sirui Huang, Xin Zou, Zhenghua Li, Xuming Hu

TL;DR
This paper introduces Preference-Aligned Distillation (PAD), a novel method for aligning small language models with human preferences by modeling nuanced preference distributions, leading to significant improvements over existing methods.
Contribution
PAD models teacher preferences as probability distributions over all responses, capturing nuanced preferences and outperforming existing distillation approaches.
Findings
PAD achieves over 20% improvement on AlpacaEval 2 and Arena-Hard.
PAD surpasses teacher models on MT-Bench with Gemma models.
Experiments validate PAD's superior alignment with human preferences.
Abstract
Aligning small language models (SLMs) with human values typically involves distilling preference knowledge from large language models (LLMs). However, existing distillation methods model preference knowledge in teacher LLMs by comparing pairwise responses, overlooking the extent of difference between responses. This limitation hinders student SLMs from capturing the nuanced preferences for multiple responses. In this paper, we propose a Preference-Aligned Distillation (PAD) framework, which models teacher's preference knowledge as a probability distribution over all potential preferences, thereby providing more nuanced supervisory signals. Our insight in developing PAD is rooted in the demonstration that language models can serve as reward functions, reflecting their intrinsic preferences. Based on this, PAD comprises three key steps: (1) sampling diverse responses using…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems
MethodsALIGN
