Exploring End-to-End Techniques for Low-Resource Speech Recognition

Vladimir Bataev; Maxim Korenevsky; Ivan Medennikov; Alexander; Zatvornitskiy

arXiv:1807.00868·cs.SD·July 4, 2018

Exploring End-to-End Techniques for Low-Resource Speech Recognition

Vladimir Bataev, Maxim Korenevsky, Ivan Medennikov, Alexander, Zatvornitskiy

PDF

TL;DR

This paper introduces a simple end-to-end grapheme-based speech recognition system for Turkish low-resource data, exploring various neural architectures and a novel CTC-loss modification, achieving state-of-the-art results.

Contribution

It presents a new end-to-end approach with a CTC-loss modification and compares multiple neural architectures for low-resource speech recognition.

Findings

01

Best model achieved 45.8% WER on Turkish speech data.

02

CTC-loss segmentation improves decoding performance.

03

ResNet with GRU architecture performed best among tested models.

Abstract

In this work we present simple grapheme-based system for low-resource speech recognition using Babel data for Turkish spontaneous speech (80 hours). We have investigated different neural network architectures performance, including fully-convolutional, recurrent and ResNet with GRU. Different features and normalization techniques are compared as well. We also proposed CTC-loss modification using segmentation during training, which leads to improvement while decoding with small beam size. Our best model achieved word error rate of 45.8%, which is the best reported result for end-to-end systems using in-domain data for this task, according to our knowledge.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsAverage Pooling · Gated Recurrent Unit · *Communicated@Fast*How Do I Communicate to Expedia? · 1x1 Convolution · Batch Normalization · Bottleneck Residual Block · Global Average Pooling · Residual Block · Kaiming Initialization · Max Pooling