Exploring Continuous Integrate-and-Fire for Adaptive Simultaneous Speech   Translation

Chih-Chiang Chang; Hung-yi Lee

arXiv:2204.09595·cs.CL·October 5, 2022

Exploring Continuous Integrate-and-Fire for Adaptive Simultaneous Speech Translation

Chih-Chiang Chang, Hung-yi Lee

PDF

Open Access 1 Repo

TL;DR

This paper introduces an adaptive policy for simultaneous speech translation using Continuous Integrate-and-Fire (CIF), improving over fixed wait-k strategies with better efficiency and generalization, demonstrated on the MuST-C V2 dataset.

Contribution

The paper proposes modeling adaptive policies in SimulST with CIF, offering a simpler, more effective alternative to monotonic multihead attention for low-latency translation.

Findings

01

CIF-based method outperforms MMA in translation quality at low latency

02

The approach generalizes better to long utterances

03

Experimental results on MuST-C V2 validate effectiveness

Abstract

Simultaneous speech translation (SimulST) is a challenging task aiming to translate streaming speech before the complete input is observed. A SimulST system generally includes two components: the pre-decision that aggregates the speech information and the policy that decides to read or write. While recent works had proposed various strategies to improve the pre-decision, they mainly adopt the fixed wait-k policy, leaving the adaptive policies rarely explored. This paper proposes to model the adaptive policy by adapting the Continuous Integrate-and-Fire (CIF). Compared with monotonic multihead attention (MMA), our method has the advantage of simpler computation, superior quality at low latency, and better generalization to long utterances. We conduct experiments on the MuST-C V2 dataset and show the effectiveness of our approach.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

George0828Zhang/simulst
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis