Efficient Conformer with Prob-Sparse Attention Mechanism for End-to-EndSpeech Recognition
Xiong Wang, Sining Sun, Lei Xie, Long Ma

TL;DR
This paper introduces a prob-sparse self-attention mechanism into Conformer models for speech recognition, significantly speeding up inference and reducing memory usage without sacrificing accuracy.
Contribution
It proposes a novel prob-sparse attention method based on KL divergence to efficiently sparsify self-attention in Conformer models for ASR.
Findings
Achieves 8% to 45% inference speed-up.
Reduces memory usage by 15% to 45%.
Maintains the same error rate as standard Conformer.
Abstract
End-to-end models are favored in automatic speech recognition (ASR) because of their simplified system structure and superior performance. Among these models, Transformer and Conformer have achieved state-of-the-art recognition accuracy in which self-attention plays a vital role in capturing important global information. However, the time and memory complexity of self-attention increases squarely with the length of the sentence. In this paper, a prob-sparse self-attention mechanism is introduced into Conformer to sparse the computing process of self-attention in order to accelerate inference speed and reduce space consumption. Specifically, we adopt a Kullback-Leibler divergence based sparsity measurement for each query to decide whether we compute the attention function on this query. By using the prob-sparse attention mechanism, we achieve impressively 8% to 45% inference speed-up and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Topic Modeling
