Enhancing Knowledge Distillation of Large Language Models through   Efficient Multi-Modal Distribution Alignment

Tianyu Peng; Jiajun Zhang

arXiv:2409.12545·cs.CL·December 19, 2024

Enhancing Knowledge Distillation of Large Language Models through Efficient Multi-Modal Distribution Alignment

Tianyu Peng, Jiajun Zhang

PDF

Open Access 1 Repo

TL;DR

This paper introduces RLKD, a novel knowledge distillation method that aligns multi-modal distributions in large language models by using ranking loss, significantly improving student model performance.

Contribution

The paper proposes a ranking loss-based approach for better multi-modal distribution alignment in knowledge distillation of LLMs, addressing inefficiencies of previous methods.

Findings

01

RLKD improves multi-modal distribution learning in student models.

02

Experimental results show significant performance gains in downstream tasks.

03

The method maintains compatibility with existing distillation objectives.

Abstract

Knowledge distillation (KD) is an effective model compression method that can transfer the internal capabilities of large language models (LLMs) to smaller ones. However, the multi-modal probability distribution predicted by teacher LLMs causes difficulties for student models to learn. In this paper, we first demonstrate the importance of multi-modal distribution alignment with experiments and then highlight the inefficiency of existing KD approaches in learning multi-modal distributions. To address this problem, we propose Ranking Loss based Knowledge Distillation (RLKD), which encourages the consistency of the ranking of peak predictions between the teacher and student models. By incorporating word-level ranking loss, we ensure excellent compatibility with existing distillation objectives while fully leveraging the fine-grained information between different categories in peaks of two…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

pty72/rlkd
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text and Document Classification Technologies

MethodsKnowledge Distillation