State-of-the-Art Speech Recognition Using Multi-Stream Self-Attention   With Dilated 1D Convolutions

Kyu J. Han; Ramon Prieto; Kaixing Wu; Tao Ma

arXiv:1910.00716·cs.CL·October 3, 2019

State-of-the-Art Speech Recognition Using Multi-Stream Self-Attention With Dilated 1D Convolutions

Kyu J. Han, Ramon Prieto, Kaixing Wu, Tao Ma

PDF

1 Repo

TL;DR

This paper introduces a multi-stream self-attention neural network architecture with dilated 1D convolutions for speech recognition, achieving state-of-the-art results on LibriSpeech.

Contribution

It proposes a novel multi-stream self-attention model with dilated convolutions to better handle correlated speech frames, improving speech recognition accuracy.

Findings

01

Achieved 2.2% WER on LibriSpeech test-clean dataset.

02

Outperforms previous models on speech recognition benchmarks.

03

Demonstrates efficiency of multi-resolution attention in speech tasks.

Abstract

Self-attention has been a huge success for many downstream tasks in NLP, which led to exploration of applying self-attention to speech problems as well. The efficacy of self-attention in speech applications, however, seems not fully blown yet since it is challenging to handle highly correlated speech frames in the context of self-attention. In this paper we propose a new neural network model architecture, namely multi-stream self-attention, to address the issue thus make the self-attention mechanism more effective for speech recognition. The proposed model architecture consists of parallel streams of self-attention encoders, and each stream has layers of 1D convolutions with dilated kernels whose dilation rates are unique given stream, followed by a self-attention layer. The self-attention mechanism in each stream pays attention to only one resolution of input speech frames and the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

s-omranpour/Pytorch-Speech-Recognition
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.