A Study of All-Convolutional Encoders for Connectionist Temporal   Classification

Kalpesh Krishna; Liang Lu; Kevin Gimpel; Karen Livescu

arXiv:1710.10398·cs.CL·February 16, 2018

A Study of All-Convolutional Encoders for Connectionist Temporal Classification

Kalpesh Krishna, Liang Lu, Kevin Gimpel, Karen Livescu

PDF

TL;DR

This paper investigates replacing RNNs with deep convolutional neural networks as encoders in CTC-based speech recognition, demonstrating faster training and decoding with comparable accuracy.

Contribution

It introduces CNN-based encoders for CTC in speech recognition, showing they are efficient alternatives to RNNs with similar performance.

Findings

01

CNN encoders are faster to train and decode than RNNs.

02

CNN models achieve comparable word error rates to LSTMs.

03

CNNs significantly reduce training and decoding times.

Abstract

Connectionist temporal classification (CTC) is a popular sequence prediction approach for automatic speech recognition that is typically used with models based on recurrent neural networks (RNNs). We explore whether deep convolutional neural networks (CNNs) can be used effectively instead of RNNs as the "encoder" in CTC. CNNs lack an explicit representation of the entire sequence, but have the advantage that they are much faster to train. We present an exploration of CNNs as encoders for CTC models, in the context of character-based (lexicon-free) automatic speech recognition. In particular, we explore a range of one-dimensional convolutional layers, which are particularly efficient. We compare the performance of our CNN-based models against typical RNNbased models in terms of training time, decoding time, model size and word error rate (WER) on the Switchboard Eval2000 corpus. We find…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.