PLaD: Preference-based Large Language Model Distillation with   Pseudo-Preference Pairs

Rongzhi Zhang; Jiaming Shen; Tianqi Liu; Haorui Wang; Zhen Qin; Feng; Han; Jialu Liu; Simon Baumgartner; Michael Bendersky; Chao Zhang

arXiv:2406.02886·cs.CL·June 7, 2024

PLaD: Preference-based Large Language Model Distillation with Pseudo-Preference Pairs

Rongzhi Zhang, Jiaming Shen, Tianqi Liu, Haorui Wang, Zhen Qin, Feng, Han, Jialu Liu, Simon Baumgartner, Michael Bendersky, Chao Zhang

PDF

Open Access

TL;DR

PLaD introduces a preference-based distillation method for LLMs that uses pseudo-preference pairs and ranking loss to improve student model calibration and performance without needing internal teacher states.

Contribution

The paper proposes PLaD, a novel distillation framework that addresses capacity gaps and calibration issues in LLMs using preference-based learning and pseudo-preference pairs.

Findings

01

PLaD improves student LLM performance on sequence generation tasks.

02

PLaD effectively calibrates student models without access to teacher internal states.

03

Experimental results show PLaD outperforms traditional distillation methods.

Abstract

Large Language Models (LLMs) have exhibited impressive capabilities in various tasks, yet their vast parameter sizes restrict their applicability in resource-constrained settings. Knowledge distillation (KD) offers a viable solution by transferring expertise from large teacher models to compact student models. However, traditional KD techniques face specific challenges when applied to LLMs, including restricted access to LLM outputs, significant teacher-student capacity gaps, and the inherited mis-calibration issue. In this work, we present PLaD, a novel preference-based LLM distillation framework. PLaD exploits the teacher-student capacity discrepancy to generate pseudo-preference pairs where teacher outputs are preferred over student outputs. Then, PLaD leverages a ranking loss to re-calibrate student's estimation of sequence likelihood, which steers the student's focus towards…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling

MethodsFocus · Knowledge Distillation