DiMS: Distilling Multiple Steps of Iterative Non-Autoregressive   Transformers for Machine Translation

Sajad Norouzi; Rasa Hosseinzadeh; Felipe Perez; Maksims Volkovs

arXiv:2206.02999·cs.CL·June 13, 2023

DiMS: Distilling Multiple Steps of Iterative Non-Autoregressive Transformers for Machine Translation

Sajad Norouzi, Rasa Hosseinzadeh, Felipe Perez, Maksims Volkovs

PDF

Open Access 1 Repo

TL;DR

DiMS is a distillation technique that reduces the number of decoding steps in iterative non-autoregressive transformers, maintaining translation quality while improving computational efficiency.

Contribution

Introduces DiMS, a distillation method that enables single-step translation with iterative transformer benefits, without extra inference cost.

Findings

01

Achieves 7.8 BLEU improvement on distilled models

02

Achieves 12.9 BLEU improvement on raw models

03

Enhances translation accuracy with fewer decoding steps

Abstract

The computational benefits of iterative non-autoregressive transformers decrease as the number of decoding steps increases. As a remedy, we introduce Distill Multiple Steps (DiMS), a simple yet effective distillation technique to decrease the number of required steps to reach a certain translation quality. The distilled model enjoys the computational benefits of early iterations while preserving the enhancements from several iterative steps. DiMS relies on two models namely student and teacher. The student is optimized to predict the output of the teacher after multiple decoding steps while the teacher follows the student via a slow-moving average. The moving average keeps the teacher's knowledge updated and enhances the quality of the labels provided by the teacher. During inference, the student is used for translation and no additional computation is added. We verify the effectiveness…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

layer6ai-labs/dims
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModel Reduction and Neural Networks · Natural Language Processing Techniques