Directed Acyclic Transformer for Non-Autoregressive Machine Translation
Fei Huang, Hao Zhou, Yang Liu, Hang Li, Minlie Huang

TL;DR
The paper introduces DA-Transformer, a novel non-autoregressive model using a DAG structure to generate multiple translations simultaneously, significantly improving translation quality without relying on knowledge distillation.
Contribution
It proposes the Directed Acyclic Transformer that captures multiple translations in a DAG, enabling fast, parallel decoding with improved accuracy over previous NATs.
Findings
Outperforms previous NATs by about 3 BLEU on WMT benchmarks
Achieves competitive results with autoregressive models without knowledge distillation
Demonstrates effective modeling of multiple translations in a DAG structure
Abstract
Non-autoregressive Transformers (NATs) significantly reduce the decoding latency by generating all tokens in parallel. However, such independent predictions prevent NATs from capturing the dependencies between the tokens for generating multiple possible translations. In this paper, we propose Directed Acyclic Transfomer (DA-Transformer), which represents the hidden states in a Directed Acyclic Graph (DAG), where each path of the DAG corresponds to a specific translation. The whole DAG simultaneously captures multiple translations and facilitates fast predictions in a non-autoregressive fashion. Experiments on the raw training data of WMT benchmark show that DA-Transformer substantially outperforms previous NATs by about 3 BLEU on average, which is the first NAT model that achieves competitive results with autoregressive Transformers without relying on knowledge distillation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Machine Learning in Bioinformatics
