TL;DR
This paper introduces a memory-augmented conformer model with a neural Turing machine component to improve end-to-end speech recognition performance on long utterances, addressing the degradation issue of existing models.
Contribution
It proposes a novel conformer-NTM architecture that incorporates a differentiable external memory to enhance long-utterance recognition in ASR systems.
Findings
Outperforms baseline conformer on Librispeech long utterances
Memory augmentation improves generalization for longer speech segments
Demonstrates effectiveness of neural Turing machine in ASR context
Abstract
Conformers have recently been proposed as a promising modelling approach for automatic speech recognition (ASR), outperforming recurrent neural network-based approaches and transformers. Nevertheless, in general, the performance of these end-to-end models, especially attention-based models, is particularly degraded in the case of long utterances. To address this limitation, we propose adding a fully-differentiable memory-augmented neural network between the encoder and decoder of a conformer. This external memory can enrich the generalization for longer utterances since it allows the system to store and retrieve more information recurrently. Notably, we explore the neural Turing machine (NTM) that results in our proposed Conformer-NTM model architecture for ASR. Experimental results using Librispeech train-clean-100 and train-960 sets show that the proposed system outperforms the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSoftmax · Sigmoid Activation · Tanh Activation · Content-based Attention · Location-based Attention · Long Short-Term Memory · Neural Turing Machine
