Efficient Monotonic Multihead Attention
Xutai Ma, Anna Sun, Siqi Ouyang, Hirofumi Inaguma, Paden Tomasello

TL;DR
This paper presents EMMA, a novel monotonic multihead attention model for simultaneous speech-to-text translation, featuring stable alignment estimation, improved training strategies, and achieving state-of-the-art results on Spanish and English tasks.
Contribution
The paper introduces EMMA, a new efficient monotonic multihead attention mechanism with unbiased alignment estimation and enhanced training methods for improved translation performance.
Findings
Achieves state-of-the-art results in speech-to-text translation
Demonstrates stable and unbiased alignment estimation
Improves training and inference efficiency
Abstract
We introduce the Efficient Monotonic Multihead Attention (EMMA), a state-of-the-art simultaneous translation model with numerically-stable and unbiased monotonic alignment estimation. In addition, we present improved training and inference strategies, including simultaneous fine-tuning from an offline translation model and reduction of monotonic alignment variance. The experimental results demonstrate that the proposed model attains state-of-the-art performance in simultaneous speech-to-text translation on the Spanish and English translation task.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis
