TL;DR
This paper introduces a local dense synthesizer attention (LDSA) mechanism for Transformer-based speech recognition, which improves accuracy and reduces computation by combining local and global attention scopes.
Contribution
It proposes LDSA as an alternative to self-attention, and combines LDSA with self-attention to enhance speech recognition performance while lowering computational costs.
Findings
LDSA-Transformer achieves CER of 6.49%, better than SA-Transformer.
Combining LDSA with SA reduces CER to 6.18%, outperforming SA-Transformer.
LDSA requires less computation than standard self-attention.
Abstract
Recently, several studies reported that dot-product selfattention (SA) may not be indispensable to the state-of-theart Transformer models. Motivated by the fact that dense synthesizer attention (DSA), which dispenses with dot products and pairwise interactions, achieved competitive results in many language processing tasks, in this paper, we first propose a DSA-based speech recognition, as an alternative to SA. To reduce the computational complexity and improve the performance, we further propose local DSA (LDSA) to restrict the attention scope of DSA to a local range around the current central frame for speech recognition. Finally, we combine LDSA with SA to extract the local and global information simultaneously. Experimental results on the Ai-shell1 Mandarine speech recognition corpus show that the proposed LDSA-Transformer achieves a character error rate (CER) of 6.49%, which is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Adam · Softmax · Layer Normalization · Dense Connections · Multi-Head Attention · Label Smoothing
