SRU++: Pioneering Fast Recurrence with Attention for Speech Recognition

Jing Pan; Tao Lei; Kwangyoun Kim; Kyu Han; Shinji Watanabe

arXiv:2110.05571·eess.AS·October 13, 2021

SRU++: Pioneering Fast Recurrence with Attention for Speech Recognition

Jing Pan, Tao Lei, Kwangyoun Kim, Kyu Han, Shinji Watanabe

PDF

Open Access

TL;DR

This paper demonstrates that SRU++, a novel architecture combining fast recurrence and attention, achieves competitive speech recognition performance and excels on long-form speech inputs compared to the Conformer model.

Contribution

The paper applies SRU++ to ASR, showing its advantages over Conformer, especially for long-form speech, and provides a comprehensive comparison across multiple benchmarks.

Findings

01

SRU++ achieves 2.0% / 4.7% WER on LibriSpeech test sets.

02

SRU++ surpasses Conformer on long-form speech inputs.

03

SRU++ demonstrates improved efficiency and performance in ASR tasks.

Abstract

The Transformer architecture has been well adopted as a dominant architecture in most sequence transduction tasks including automatic speech recognition (ASR), since its attention mechanism excels in capturing long-range dependencies. While models built solely upon attention can be better parallelized than regular RNN, a novel network architecture, SRU++, was recently proposed. By combining the fast recurrence and attention mechanism, SRU++ exhibits strong capability in sequence modeling and achieves near-state-of-the-art results in various language modeling and machine translation tasks with improved compute efficiency. In this work, we present the advantages of applying SRU++ in ASR tasks by comparing with Conformer across multiple ASR benchmarks and study how the benefits can be generalized to long-form speech inputs. On the popular LibriSpeech benchmark, our SRU++ model achieves…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Topic Modeling

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Adam · Softmax · Dropout · Dense Connections · Layer Normalization · Absolute Position Encodings · Position-Wise Feed-Forward Layer