Helping the Weak Makes You Strong: Simple Multi-Task Learning Improves Non-Autoregressive Translators
Xinyou Wang, Zaixiang Zheng, Shujian Huang

TL;DR
This paper introduces a multi-task learning framework with weak autoregressive decoders to enhance non-autoregressive translation models, resulting in improved accuracy without extra decoding costs.
Contribution
It proposes a simple, model-agnostic multi-task learning approach that strengthens NAR models by training them alongside weak AR decoders, providing more informative learning signals.
Findings
Consistent accuracy improvements on WMT and IWSLT datasets.
No additional decoding overhead introduced.
Applicable to multiple NAR baseline models.
Abstract
Recently, non-autoregressive (NAR) neural machine translation models have received increasing attention due to their efficient parallel decoding. However, the probabilistic framework of NAR models necessitates conditional independence assumption on target sequences, falling short of characterizing human language data. This drawback results in less informative learning signals for NAR models under conventional MLE training, thereby yielding unsatisfactory accuracy compared to their autoregressive (AR) counterparts. In this paper, we propose a simple and model-agnostic multi-task learning framework to provide more informative learning signals. During training stage, we introduce a set of sufficiently weak AR decoders that solely rely on the information provided by NAR decoder to make prediction, forcing the NAR decoder to become stronger or else it will be unable to support its weak AR…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Machine Learning and Data Classification
