Dynamic Knowledge Distillation for Pre-trained Language Models

Lei Li; Yankai Lin; Shuhuai Ren; Peng Li; Jie Zhou; Xu Sun

arXiv:2109.11295·cs.CL·September 24, 2021

Dynamic Knowledge Distillation for Pre-trained Language Models

Lei Li, Yankai Lin, Shuhuai Ren, Peng Li, Jie Zhou, Xu Sun

PDF

Open Access 1 Repo

TL;DR

This paper introduces a dynamic knowledge distillation approach for pre-trained language models, allowing the student to adapt its learning process based on performance, data informativeness, and objective contributions, leading to improved efficiency and performance.

Contribution

It proposes a novel dynamic KD framework that adjusts teacher selection, data usage, and objectives during training, enhancing model compression and training efficiency.

Findings

01

Proper teacher selection boosts student performance.

02

Using 10% informative data achieves comparable results faster.

03

Adjusting alignment objectives improves student outcomes.

Abstract

Knowledge distillation~(KD) has been proved effective for compressing large-scale pre-trained language models. However, existing methods conduct KD statically, e.g., the student model aligns its output distribution to that of a selected teacher model on the pre-defined training dataset. In this paper, we explore whether a dynamic knowledge distillation that empowers the student to adjust the learning procedure according to its competency, regarding the student performance and learning efficiency. We explore the dynamical adjustments on three aspects: teacher model adoption, data selection, and KD objective adaptation. Experimental results show that (1) proper selection of teacher model can boost the performance of student model; (2) conducting KD with 10% informative instances achieves comparable performance while greatly accelerates the training; (3) the student performance can be…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lancopku/dynamickd
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsKnowledge Distillation