Efficient Wait-k Models for Simultaneous Machine Translation

Maha Elbayad; Laurent Besacier; Jakob Verbeek

arXiv:2005.08595·cs.CL·August 5, 2020

Efficient Wait-k Models for Simultaneous Machine Translation

Maha Elbayad, Laurent Besacier, Jakob Verbeek

PDF

1 Repo

TL;DR

This paper explores efficient wait-k models for simultaneous machine translation, demonstrating their effectiveness across different architectures and latency levels, especially in low-resource spoken language settings.

Contribution

It introduces improved training methods for wait-k models using unidirectional encoders and multi-k training, and compares Transformer and 2D-convolutional architectures.

Findings

01

Wait-k models generalize well across latency levels.

02

2D-convolutional architecture is competitive with Transformers.

03

Models perform effectively in low-resource spoken language scenarios.

Abstract

Simultaneous machine translation consists in starting output generation before the entire input sequence is available. Wait-k decoders offer a simple but efficient approach for this problem. They first read k source tokens, after which they alternate between producing a target token and reading another source token. We investigate the behavior of wait-k decoding in low resource settings for spoken corpora using IWSLT datasets. We improve training of these models using unidirectional encoders, and training across multiple values of k. Experiments with Transformer and 2D-convolutional architectures show that our wait-k models generalize well across a wide range of latency levels. We also show that the 2D-convolution architecture is competitive with Transformers for simultaneous translation of spoken language.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

elbayadm/attn2d/tree/master/examples/waitk
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Label Smoothing · Multi-Head Attention · Adam · *Communicated@Fast*How Do I Communicate to Expedia? · Dropout · Byte Pair Encoding