End-To-End Speech Recognition Using A High Rank LSTM-CTC Based Model

Yangyang Shi; Mei-Yuh Hwang; Xin Lei

arXiv:1903.05261·cs.CL·March 14, 2019·1 cites

End-To-End Speech Recognition Using A High Rank LSTM-CTC Based Model

Yangyang Shi, Mei-Yuh Hwang, Xin Lei

PDF

Open Access 1 Repo

TL;DR

This paper introduces a high rank projection layer in LSTM-CTC models for speech recognition, significantly improving their expressiveness and reducing word error rates on standard datasets without external data or augmentation.

Contribution

The paper proposes a novel high rank projection layer to enhance LSTM-CTC models' expressiveness in end-to-end speech recognition.

Findings

01

Achieves 4-6% relative WER reduction on WSJ and LibriSpeech datasets.

02

Outperforms other published CTC-based end-to-end models without external data.

03

Code is publicly available for reproducibility.

Abstract

Long Short Term Memory Connectionist Temporal Classification (LSTM-CTC) based end-to-end models are widely used in speech recognition due to its simplicity in training and efficiency in decoding. In conventional LSTM-CTC based models, a bottleneck projection matrix maps the hidden feature vectors obtained from LSTM to softmax output layer. In this paper, we propose to use a high rank projection layer to replace the projection matrix. The output from the high rank projection layer is a weighted combination of vectors that are projected from the hidden feature vectors via different projection matrices and non-linear activation function. The high rank projection layer is able to improve the expressiveness of LSTM-CTC models. The experimental results show that on Wall Street Journal (WSJ) corpus and LibriSpeech data set, the proposed method achieves 4%-6% relative word error rate (WER)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mobvoi/lstm_ctc
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Music and Audio Processing

MethodsSoftmax