Capturing Nuanced Preferences: Preference-Aligned Distillation for Small   Language Models

Yanggan Gu; Junzhuo Li; Sirui Huang; Xin Zou; Zhenghua Li; Xuming Hu

arXiv:2502.14272·cs.CL·February 21, 2025

Capturing Nuanced Preferences: Preference-Aligned Distillation for Small Language Models

Yanggan Gu, Junzhuo Li, Sirui Huang, Xin Zou, Zhenghua Li, Xuming Hu

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces Preference-Aligned Distillation (PAD), a novel method for aligning small language models with human preferences by modeling nuanced preference distributions, leading to significant improvements over existing methods.

Contribution

PAD models teacher preferences as probability distributions over all responses, capturing nuanced preferences and outperforming existing distillation approaches.

Findings

01

PAD achieves over 20% improvement on AlpacaEval 2 and Arena-Hard.

02

PAD surpasses teacher models on MT-Bench with Gemma models.

03

Experiments validate PAD's superior alignment with human preferences.

Abstract

Aligning small language models (SLMs) with human values typically involves distilling preference knowledge from large language models (LLMs). However, existing distillation methods model preference knowledge in teacher LLMs by comparing pairwise responses, overlooking the extent of difference between responses. This limitation hinders student SLMs from capturing the nuanced preferences for multiple responses. In this paper, we propose a Preference-Aligned Distillation (PAD) framework, which models teacher's preference knowledge as a probability distribution over all potential preferences, thereby providing more nuanced supervisory signals. Our insight in developing PAD is rooted in the demonstration that language models can serve as reward functions, reflecting their intrinsic preferences. Based on this, PAD comprises three key steps: (1) sampling diverse responses using…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

egangu/pad
pytorchOfficial

Videos

Capturing Nuanced Preferences: Preference-Aligned Distillation for Small Language Models· underline

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems

MethodsALIGN