Distiller: A Systematic Study of Model Distillation Methods in Natural   Language Processing

Haoyu He; Xingjian Shi; Jonas Mueller; Zha Sheng; Mu Li; George; Karypis

arXiv:2109.11105·cs.CL·September 24, 2021

Distiller: A Systematic Study of Model Distillation Methods in Natural Language Processing

Haoyu He, Xingjian Shi, Jonas Mueller, Zha Sheng, Mu Li, George, Karypis

PDF

Open Access

TL;DR

This paper systematically studies the impact of various components in knowledge distillation for NLP, introduces a meta framework called Distiller, and identifies key factors influencing performance across datasets.

Contribution

It proposes Distiller, a comprehensive meta framework for analyzing and optimizing knowledge distillation components in NLP, including a universal MI objective and an AutoDistiller algorithm.

Findings

01

Intermediate representation distillation is most critical for KD performance.

02

MI-$ extalpha$ achieves superior results among MI objectives.

03

Data augmentation significantly benefits small datasets and models.

Abstract

We aim to identify how different components in the KD pipeline affect the resulting performance and how much the optimal KD pipeline varies across different datasets/tasks, such as the data augmentation policy, the loss function, and the intermediate representation for transferring the knowledge between teacher and student. To tease apart their effects, we propose Distiller, a meta KD framework that systematically combines a broad range of techniques across different stages of the KD pipeline, which enables us to quantify each component's contribution. Within Distiller, we unify commonly used objectives for distillation of intermediate representations under a universal mutual information (MI) objective and propose a class of MI- $α$ objective functions with better bias/variance trade-off for estimating the MI between the teacher and the student. On a diverse set of NLP datasets, the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Machine Learning and Data Classification