Attention as a Guide for Simultaneous Speech Translation

Sara Papi; Matteo Negri; Marco Turchi

arXiv:2212.07850·cs.CL·October 19, 2023·1 cites

Attention as a Guide for Simultaneous Speech Translation

Sara Papi, Matteo Negri, Marco Turchi

PDF

Open Access 2 Repos

TL;DR

This paper introduces an attention-based policy for simultaneous speech translation that leverages encoder-decoder attention scores to improve real-time translation performance and latency.

Contribution

It is the first to analyze encoder-decoder attention in speech translation and uses this analysis to develop a new inference policy for better results.

Findings

01

Improved translation quality over state-of-the-art methods.

02

Enhanced latency performance in real-time translation.

03

Effective use of attention scores for guiding inference.

Abstract

The study of the attention mechanism has sparked interest in many fields, such as language modeling and machine translation. Although its patterns have been exploited to perform different tasks, from neural network understanding to textual alignment, no previous work has analysed the encoder-decoder attention behavior in speech translation (ST) nor used it to improve ST on a specific task. In this paper, we fill this gap by proposing an attention-based policy (EDAtt) for simultaneous ST (SimulST) that is motivated by an analysis of the existing attention relations between audio input and textual output. Its goal is to leverage the encoder-decoder attention scores to guide inference in real time. Results on en->{de, es} show that the EDAtt policy achieves overall better results compared to the SimulST state of the art, especially in terms of computational-aware latency.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems