Viterbi Decoding of Directed Acyclic Transformer for Non-Autoregressive Machine Translation
Chenze Shao, Zhengrui Ma, Yang Feng

TL;DR
This paper introduces a Viterbi decoding method for DA-Transformer in non-autoregressive machine translation, improving translation accuracy without sacrificing decoding speed.
Contribution
It presents a Viterbi decoding framework that guarantees optimal translation and decoding path for DA-Transformer models.
Findings
Consistent performance improvement over baseline DA-Transformer
Maintains similar decoding speedup as original models
Enhances translation accuracy in non-autoregressive MT
Abstract
Non-autoregressive models achieve significant decoding speedup in neural machine translation but lack the ability to capture sequential dependency. Directed Acyclic Transformer (DA-Transformer) was recently proposed to model sequential dependency with a directed acyclic graph. Consequently, it has to apply a sequential decision process at inference time, which harms the global translation accuracy. In this paper, we present a Viterbi decoding framework for DA-Transformer, which guarantees to find the joint optimal solution for the translation and decoding path under any length constraint. Experimental results demonstrate that our approach consistently improves the performance of DA-Transformer while maintaining a similar decoding speedup.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Machine Learning in Bioinformatics · Topic Modeling
MethodsAttention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Residual Connection · Dropout · Adam · Dense Connections · Softmax · Label Smoothing · Multi-Head Attention
