Shiftable Context: Addressing Training-Inference Context Mismatch in   Simultaneous Speech Translation

Matthew Raffel; Drew Penney; Lizhong Chen

arXiv:2307.01377·cs.CL·July 6, 2023

Shiftable Context: Addressing Training-Inference Context Mismatch in Simultaneous Speech Translation

Matthew Raffel, Drew Penney, Lizhong Chen

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces Shiftable Context, a method to align training and inference contexts in segment-based transformers for simultaneous speech translation, improving translation quality across multiple language pairs.

Contribution

It proposes Shiftable Context, a simple scheme that maintains consistent segment and context sizes during training and inference, addressing context mismatch in streaming translation models.

Findings

01

Achieves average BLEU score improvements of over 1.8 points across language pairs.

02

Maintains minimal impact on computation-aware Average Lagging.

03

Applicable to segment-based transformers for streaming tasks.

Abstract

Transformer models using segment-based processing have been an effective architecture for simultaneous speech translation. However, such models create a context mismatch between training and inference environments, hindering potential translation accuracy. We solve this issue by proposing Shiftable Context, a simple yet effective scheme to ensure that consistent segment and context sizes are maintained throughout training and inference, even with the presence of partially filled segments due to the streaming nature of simultaneous translation. Shiftable Context is also broadly applicable to segment-based transformers for streaming tasks. Our experiments on the English-German, English-French, and English-Spanish language pairs from the MUST-C dataset demonstrate that when applied to the Augmented Memory Transformer, a state-of-the-art model for simultaneous speech translation, the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

osu-starlab/shiftablecontext
pytorchOfficial

Videos

Shiftable Context: Addressing Training-Inference Context Mismatch in Simultaneous Speech Translation· slideslive

Taxonomy

TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Topic Modeling

MethodsMulti-Head Attention · Attention Is All You Need · Layer Normalization · Absolute Position Encodings · Byte Pair Encoding · Linear Layer · Label Smoothing · Adam · Position-Wise Feed-Forward Layer · Residual Connection