R-BI: Regularized Batched Inputs enhance Incremental Decoding Framework for Low-Latency Simultaneous Speech Translation
Jiaxin Guo, Zhanglin Wu, Zongyao Li, Hengchao Shang, Daimeng Wei,, Xiaoyu Chen, Zhiqiang Rao, Shaojun Li, Hao Yang

TL;DR
This paper introduces R-BI, a regularized batching method that improves incremental decoding for low-latency speech translation, reducing errors and achieving state-of-the-art results with minimal BLEU score loss.
Contribution
The paper proposes a novel Regularized Batched Inputs policy that enhances incremental decoding by increasing input diversity and regularization, applicable to both end-to-end and cascade systems.
Findings
Achieves low latency with less than 2 BLEU points loss.
Attains new state-of-the-art results on IWSLT SimulST tasks.
Effectively reduces output errors in incremental decoding.
Abstract
Incremental Decoding is an effective framework that enables the use of an offline model in a simultaneous setting without modifying the original model, making it suitable for Low-Latency Simultaneous Speech Translation. However, this framework may introduce errors when the system outputs from incomplete input. To reduce these output errors, several strategies such as Hold-, LA-, and SP- can be employed, but the hyper-parameter needs to be carefully selected for optimal performance. Moreover, these strategies are more suitable for end-to-end systems than cascade systems. In our paper, we propose a new adaptable and efficient policy named "Regularized Batched Inputs". Our method stands out by enhancing input diversity to mitigate output errors. We suggest particular regularization techniques for both end-to-end and cascade systems. We conducted experiments on IWSLT…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis
