Efficient Conformer with Prob-Sparse Attention Mechanism for   End-to-EndSpeech Recognition

Xiong Wang; Sining Sun; Lei Xie; Long Ma

arXiv:2106.09236·cs.SD·June 18, 2021·5 cites

Efficient Conformer with Prob-Sparse Attention Mechanism for End-to-EndSpeech Recognition

Xiong Wang, Sining Sun, Lei Xie, Long Ma

PDF

Open Access

TL;DR

This paper introduces a prob-sparse self-attention mechanism into Conformer models for speech recognition, significantly speeding up inference and reducing memory usage without sacrificing accuracy.

Contribution

It proposes a novel prob-sparse attention method based on KL divergence to efficiently sparsify self-attention in Conformer models for ASR.

Findings

01

Achieves 8% to 45% inference speed-up.

02

Reduces memory usage by 15% to 45%.

03

Maintains the same error rate as standard Conformer.

Abstract

End-to-end models are favored in automatic speech recognition (ASR) because of their simplified system structure and superior performance. Among these models, Transformer and Conformer have achieved state-of-the-art recognition accuracy in which self-attention plays a vital role in capturing important global information. However, the time and memory complexity of self-attention increases squarely with the length of the sentence. In this paper, a prob-sparse self-attention mechanism is introduced into Conformer to sparse the computing process of self-attention in order to accelerate inference speed and reduce space consumption. Specifically, we adopt a Kullback-Leibler divergence based sparsity measurement for each query to decide whether we compute the attention function on this query. By using the prob-sparse attention mechanism, we achieve impressively 8% to 45% inference speed-up and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Topic Modeling