Decoder-only Streaming Transformer for Simultaneous Translation
Shoutao Guo, Shaolei Zhang, Yang Feng

TL;DR
This paper introduces the first Decoder-only Streaming Transformer for simultaneous translation, leveraging a novel streaming self-attention mechanism to improve translation quality and achieve state-of-the-art results.
Contribution
It proposes a new Decoder-only architecture for SiMT, including a streaming self-attention mechanism and position encoding strategy, addressing training and inference challenges.
Findings
Achieves state-of-the-art performance on three translation tasks.
Demonstrates effective translation policy assessment with SSA.
Outperforms existing Encoder-Decoder SiMT models.
Abstract
Simultaneous Machine Translation (SiMT) generates translation while reading source tokens, essentially producing the target prefix based on the source prefix. To achieve good performance, it leverages the relationship between source and target prefixes to exact a policy to guide the generation of translations. Although existing SiMT methods primarily focus on the Encoder-Decoder architecture, we explore the potential of Decoder-only architecture, owing to its superior performance in various tasks and its inherent compatibility with SiMT. However, directly applying the Decoder-only architecture to SiMT poses challenges in terms of training and inference. To alleviate the above problems, we propose the first Decoder-only SiMT model, named Decoder-only Streaming Transformer (DST). Specifically, DST separately encodes the positions of the source and target prefixes, ensuring that the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Data Compression Techniques
MethodsAttention Is All You Need · Dynamic Sparse Training · Softmax · Focus · Layer Normalization · Linear Layer · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Label Smoothing · Adam
