Direct Preference Knowledge Distillation for Large Language Models

Yixing Li; Yuxian Gu; Li Dong; Dequan Wang; Yu Cheng; Furu Wei

arXiv:2406.19774·cs.CL·April 8, 2025

Direct Preference Knowledge Distillation for Large Language Models

Yixing Li, Yuxian Gu, Li Dong, Dequan Wang, Yu Cheng, Furu Wei

PDF

Open Access

TL;DR

This paper introduces Direct Preference Knowledge Distillation (DPKD), a novel method for improving large language model training by using distribution divergence and implicit reward functions, leading to better output quality.

Contribution

The paper proposes DPKD, a new two-stage knowledge distillation approach that incorporates implicit reward functions and distribution divergence for large language models.

Findings

01

DPKD outperforms baseline methods in response precision.

02

DPKD improves exact match percentage.

03

The approach is effective across models from 120M to 13B parameters.

Abstract

In the field of large language models (LLMs), Knowledge Distillation (KD) is a critical technique for transferring capabilities from teacher models to student models. However, existing KD methods face limitations and challenges in distillation of LLMs, including efficiency and insufficient measurement capabilities of traditional KL divergence. It is shown that LLMs can serve as an implicit reward function, which we define as a supplement to KL divergence. In this work, we propose Direct Preference Knowledge Distillation (DPKD) for LLMs. DPKD utilizes distribution divergence to represent the preference loss and implicit reward function. We re-formulate KD of LLMs into two stages: first optimizing and objective consisting of implicit reward and reverse KL divergence and then improving the preference probability of teacher outputs over student outputs. We conducted experiments and analysis…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Semantic Web and Ontologies

MethodsKnowledge Distillation