Viterbi Decoding of Directed Acyclic Transformer for Non-Autoregressive   Machine Translation

Chenze Shao; Zhengrui Ma; Yang Feng

arXiv:2210.05193·cs.CL·March 3, 2023·1 cites

Viterbi Decoding of Directed Acyclic Transformer for Non-Autoregressive Machine Translation

Chenze Shao, Zhengrui Ma, Yang Feng

PDF

Open Access 1 Repo

TL;DR

This paper introduces a Viterbi decoding method for DA-Transformer in non-autoregressive machine translation, improving translation accuracy without sacrificing decoding speed.

Contribution

It presents a Viterbi decoding framework that guarantees optimal translation and decoding path for DA-Transformer models.

Findings

01

Consistent performance improvement over baseline DA-Transformer

02

Maintains similar decoding speedup as original models

03

Enhances translation accuracy in non-autoregressive MT

Abstract

Non-autoregressive models achieve significant decoding speedup in neural machine translation but lack the ability to capture sequential dependency. Directed Acyclic Transformer (DA-Transformer) was recently proposed to model sequential dependency with a directed acyclic graph. Consequently, it has to apply a sequential decision process at inference time, which harms the global translation accuracy. In this paper, we present a Viterbi decoding framework for DA-Transformer, which guarantees to find the joint optimal solution for the translation and decoding path under any length constraint. Experimental results demonstrate that our approach consistently improves the performance of DA-Transformer while maintaining a similar decoding speedup.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

thu-coai/da-transformer
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Machine Learning in Bioinformatics · Topic Modeling

MethodsAttention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Residual Connection · Dropout · Adam · Dense Connections · Softmax · Label Smoothing · Multi-Head Attention