Transformer-based End-to-End Speech Recognition with Local Dense   Synthesizer Attention

Menglong Xu; Shengqiang Li; Xiao-Lei Zhang

arXiv:2010.12155·cs.SD·July 27, 2021

Transformer-based End-to-End Speech Recognition with Local Dense Synthesizer Attention

Menglong Xu, Shengqiang Li, Xiao-Lei Zhang

PDF

1 Repo

TL;DR

This paper introduces a local dense synthesizer attention (LDSA) mechanism for Transformer-based speech recognition, which improves accuracy and reduces computation by combining local and global attention scopes.

Contribution

It proposes LDSA as an alternative to self-attention, and combines LDSA with self-attention to enhance speech recognition performance while lowering computational costs.

Findings

01

LDSA-Transformer achieves CER of 6.49%, better than SA-Transformer.

02

Combining LDSA with SA reduces CER to 6.18%, outperforming SA-Transformer.

03

LDSA requires less computation than standard self-attention.

Abstract

Recently, several studies reported that dot-product selfattention (SA) may not be indispensable to the state-of-theart Transformer models. Motivated by the fact that dense synthesizer attention (DSA), which dispenses with dot products and pairwise interactions, achieved competitive results in many language processing tasks, in this paper, we first propose a DSA-based speech recognition, as an alternative to SA. To reduce the computational complexity and improve the performance, we further propose local DSA (LDSA) to restrict the attention scope of DSA to a local range around the current central frame for speech recognition. Finally, we combine LDSA with SA to extract the local and global information simultaneously. Experimental results on the Ai-shell1 Mandarine speech recognition corpus show that the proposed LDSA-Transformer achieves a character error rate (CER) of 6.49%, which is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mlxu995/multihead-LDSA
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Adam · Softmax · Layer Normalization · Dense Connections · Multi-Head Attention · Label Smoothing