Efficient Monotonic Multihead Attention

Xutai Ma; Anna Sun; Siqi Ouyang; Hirofumi Inaguma; Paden Tomasello

arXiv:2312.04515·cs.CL·December 8, 2023·5 cites

Efficient Monotonic Multihead Attention

Xutai Ma, Anna Sun, Siqi Ouyang, Hirofumi Inaguma, Paden Tomasello

PDF

Open Access

TL;DR

This paper presents EMMA, a novel monotonic multihead attention model for simultaneous speech-to-text translation, featuring stable alignment estimation, improved training strategies, and achieving state-of-the-art results on Spanish and English tasks.

Contribution

The paper introduces EMMA, a new efficient monotonic multihead attention mechanism with unbiased alignment estimation and enhanced training methods for improved translation performance.

Findings

01

Achieves state-of-the-art results in speech-to-text translation

02

Demonstrates stable and unbiased alignment estimation

03

Improves training and inference efficiency

Abstract

We introduce the Efficient Monotonic Multihead Attention (EMMA), a state-of-the-art simultaneous translation model with numerically-stable and unbiased monotonic alignment estimation. In addition, we present improved training and inference strategies, including simultaneous fine-tuning from an offline translation model and reduction of monotonic alignment variance. The experimental results demonstrate that the proposed model attains state-of-the-art performance in simultaneous speech-to-text translation on the Spanish and English translation task.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis