PLaD: Preference-based Large Language Model Distillation with Pseudo-Preference Pairs
Rongzhi Zhang, Jiaming Shen, Tianqi Liu, Haorui Wang, Zhen Qin, Feng, Han, Jialu Liu, Simon Baumgartner, Michael Bendersky, Chao Zhang

TL;DR
PLaD introduces a preference-based distillation method for LLMs that uses pseudo-preference pairs and ranking loss to improve student model calibration and performance without needing internal teacher states.
Contribution
The paper proposes PLaD, a novel distillation framework that addresses capacity gaps and calibration issues in LLMs using preference-based learning and pseudo-preference pairs.
Findings
PLaD improves student LLM performance on sequence generation tasks.
PLaD effectively calibrates student models without access to teacher internal states.
Experimental results show PLaD outperforms traditional distillation methods.
Abstract
Large Language Models (LLMs) have exhibited impressive capabilities in various tasks, yet their vast parameter sizes restrict their applicability in resource-constrained settings. Knowledge distillation (KD) offers a viable solution by transferring expertise from large teacher models to compact student models. However, traditional KD techniques face specific challenges when applied to LLMs, including restricted access to LLM outputs, significant teacher-student capacity gaps, and the inherited mis-calibration issue. In this work, we present PLaD, a novel preference-based LLM distillation framework. PLaD exploits the teacher-student capacity discrepancy to generate pseudo-preference pairs where teacher outputs are preferred over student outputs. Then, PLaD leverages a ranking loss to re-calibrate student's estimation of sequence likelihood, which steers the student's focus towards…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
MethodsFocus · Knowledge Distillation
