DFSMN-SAN with Persistent Memory Model for Automatic Speech Recognition

Zhao You; Dan Su; Jie Chen; Chao Weng; Dong Yu

arXiv:1910.13282·eess.AS·October 30, 2019·5 cites

DFSMN-SAN with Persistent Memory Model for Automatic Speech Recognition

Zhao You, Dan Su, Jie Chen, Chao Weng, Dong Yu

PDF

Open Access

TL;DR

This paper introduces a novel speech recognition model combining DFSMN and self-attention with augmented memory, significantly improving accuracy by leveraging extended contextual information beyond the entire utterance.

Contribution

It proposes a new architecture integrating DFSMN with self-attention and novel memory structures, enhancing speech recognition performance over existing SAN models.

Findings

01

DFSMN-SAN outperforms vanilla SAN by 5% in CER.

02

Additional memory structures improve CER by 5-11%.

03

Model achieves state-of-the-art results on large-scale LVCSR tasks.

Abstract

Self-attention networks (SAN) have been introduced into automatic speech recognition (ASR) and achieved state-of-the-art performance owing to its superior ability in capturing long term dependency. One of the key ingredients is the self-attention mechanism which can be effectively performed on the whole utterance level. In this paper, we try to investigate whether even more information beyond the whole utterance level can be exploited and beneficial. We propose to apply self-attention layer with augmented memory to ASR. Specifically, we first propose a variant model architecture which combines deep feed-forward sequential memory network (DFSMN) with self-attention layers to form a better baseline model compared with a purely self-attention network. Then, we propose and compare two kinds of additional memory structures added into self-attention layers. Experiments on large-scale LVCSR…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing

MethodsTest · Memory Network