Autoregressive Knowledge Distillation through Imitation Learning

Alexander Lin; Jeremy Wohlwend; Howard Chen; and Tao Lei

arXiv:2009.07253·cs.CL·October 30, 2020

Autoregressive Knowledge Distillation through Imitation Learning

Alexander Lin, Jeremy Wohlwend, Howard Chen, and Tao Lei

PDF

Open Access 2 Repos

TL;DR

This paper introduces a novel autoregressive knowledge distillation method based on imitation learning, significantly improving inference speed and performance in language generation tasks.

Contribution

It presents a new compression technique for autoregressive models that effectively addresses exposure bias and outperforms existing distillation methods.

Findings

01

Student models achieve 1.4 to 4.8 BLEU/ROUGE points higher.

02

Inference speed increases up to 14 times.

03

Method outperforms sequence-level knowledge distillation.

Abstract

The performance of autoregressive models on natural language generation tasks has dramatically improved due to the adoption of deep, self-attentive architectures. However, these gains have come at the cost of hindering inference speed, making state-of-the-art models cumbersome to deploy in real-world, time-sensitive settings. We develop a compression technique for autoregressive models that is driven by an imitation learning perspective on knowledge distillation. The algorithm is designed to address the exposure bias problem. On prototypical language generation tasks such as translation and summarization, our method consistently outperforms other distillation algorithms, such as sequence-level knowledge distillation. Student models trained with our method attain 1.4 to 4.8 BLEU/ROUGE points higher than those trained from scratch, while increasing inference speed by up to 14 times in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications