Explaining Sequence-Level Knowledge Distillation as Data-Augmentation for Neural Machine Translation
Mitchell A. Gordon, Kevin Duh

TL;DR
This paper investigates why sequence-level knowledge distillation improves neural machine translation, finding that data augmentation and regularization effects, rather than data simplification, explain its benefits, leading to BLEU score improvements.
Contribution
The paper challenges the common hypothesis about data simplification in SLKD and proposes an alternative view as data augmentation, demonstrating effective strategies that improve translation quality.
Findings
Dropout regularization can be rendered unnecessary with augmentation.
SLKD benefits may stem from data augmentation effects, not data simplification.
Achieved BLEU score gains of 0.7-1.2 on TED Talks.
Abstract
Sequence-level knowledge distillation (SLKD) is a model compression technique that leverages large, accurate teacher models to train smaller, under-parameterized student models. Why does pre-processing MT data with SLKD help us train smaller models? We test the common hypothesis that SLKD addresses a capacity deficiency in students by "simplifying" noisy data points and find it unlikely in our case. Models trained on concatenations of original and "simplified" datasets generalize just as well as baseline SLKD. We then propose an alternative hypothesis under the lens of data augmentation and regularization. We try various augmentation strategies and observe that dropout regularization can become unnecessary. Our methods achieve BLEU gains of 0.7-1.2 on TED Talks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
MethodsTest · Knowledge Distillation · Dropout
