SRU++: Pioneering Fast Recurrence with Attention for Speech Recognition
Jing Pan, Tao Lei, Kwangyoun Kim, Kyu Han, Shinji Watanabe

TL;DR
This paper demonstrates that SRU++, a novel architecture combining fast recurrence and attention, achieves competitive speech recognition performance and excels on long-form speech inputs compared to the Conformer model.
Contribution
The paper applies SRU++ to ASR, showing its advantages over Conformer, especially for long-form speech, and provides a comprehensive comparison across multiple benchmarks.
Findings
SRU++ achieves 2.0% / 4.7% WER on LibriSpeech test sets.
SRU++ surpasses Conformer on long-form speech inputs.
SRU++ demonstrates improved efficiency and performance in ASR tasks.
Abstract
The Transformer architecture has been well adopted as a dominant architecture in most sequence transduction tasks including automatic speech recognition (ASR), since its attention mechanism excels in capturing long-range dependencies. While models built solely upon attention can be better parallelized than regular RNN, a novel network architecture, SRU++, was recently proposed. By combining the fast recurrence and attention mechanism, SRU++ exhibits strong capability in sequence modeling and achieves near-state-of-the-art results in various language modeling and machine translation tasks with improved compute efficiency. In this work, we present the advantages of applying SRU++ in ASR tasks by comparing with Conformer across multiple ASR benchmarks and study how the benefits can be generalized to long-form speech inputs. On the popular LibriSpeech benchmark, our SRU++ model achieves…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Topic Modeling
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Adam · Softmax · Dropout · Dense Connections · Layer Normalization · Absolute Position Encodings · Position-Wise Feed-Forward Layer
