Streaming Transformer ASR with Blockwise Synchronous Beam Search
Emiru Tsunoo, Yosuke Kashiwagi, Shinji Watanabe

TL;DR
This paper introduces a blockwise synchronous beam search algorithm for streaming Transformer-based speech recognition, enabling real-time processing with improved accuracy and reduced latency across multiple languages.
Contribution
The paper presents a novel blockwise processing and alignment method for streaming Transformer ASR, outperforming existing online approaches and enabling effective real-time speech recognition.
Findings
Outperforms conventional online Transformer ASR methods.
Reduces response time through blockwise processing.
Achieves comparable or better accuracy than batch models.
Abstract
The Transformer self-attention network has shown promising performance as an alternative to recurrent neural networks in end-to-end (E2E) automatic speech recognition (ASR) systems. However, Transformer has a drawback in that the entire input sequence is required to compute both self-attention and source--target attention. In this paper, we propose a novel blockwise synchronous beam search algorithm based on blockwise processing of encoder to perform streaming E2E Transformer ASR. In the beam search, encoded feature blocks are synchronously aligned using a block boundary detection technique, where a reliability score of each predicted hypothesis is evaluated based on the end-of-sequence and repeated tokens in the hypothesis. Evaluations of the HKUST and AISHELL-1 Mandarin, LibriSpeech English, and CSJ Japanese tasks show that the proposed streaming Transformer algorithm outperforms…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
MethodsKnowledge Distillation
