Label-Synchronous Neural Transducer for E2E Simultaneous Speech   Translation

Keqi Deng; Philip C. Woodland

arXiv:2406.04541·cs.CL·June 10, 2024

Label-Synchronous Neural Transducer for E2E Simultaneous Speech Translation

Keqi Deng, Philip C. Woodland

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces LS-Transducer-SST, a neural transducer model for simultaneous speech translation that dynamically balances translation quality and latency using an innovative AIF mechanism, improving performance over existing methods.

Contribution

The paper proposes a label-synchronous neural transducer with an Auto-regressive Integrate-and-Fire mechanism for SST, enabling natural streaming, re-ordering, and utilization of text data, with controllable latency.

Findings

01

Outperforms existing methods in quality-latency trade-off.

02

Achieves 3.1/2.9 BLEU improvements on Es-En/En-De datasets.

03

Reduces average lagging latency by 1.4 seconds.

Abstract

While the neural transducer is popular for online speech recognition, simultaneous speech translation (SST) requires both streaming and re-ordering capabilities. This paper presents the LS-Transducer-SST, a label-synchronous neural transducer for SST, which naturally possesses these two properties. The LS-Transducer-SST dynamically decides when to emit translation tokens based on an Auto-regressive Integrate-and-Fire (AIF) mechanism. A latency-controllable AIF is also proposed, which can control the quality-latency trade-off either only during decoding, or it can be used in both decoding and training. The LS-Transducer-SST can naturally utilise monolingual text-only data via its prediction network which helps alleviate the key issue of data sparsity for E2E SST. During decoding, a chunk-based incremental joint decoding technique is designed to refine and expand the search space.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

D-Keqi/LS-Transducer-SST
pytorchOfficial

Videos

Label-Synchronous Neural Transducer for E2E Simultaneous Speech Translation· underline

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques