Character-Level Incremental Speech Recognition with Recurrent Neural   Networks

Kyuyeon Hwang; Wonyong Sung

arXiv:1601.06581·cs.CL·June 29, 2016

Character-Level Incremental Speech Recognition with Recurrent Neural Networks

Kyuyeon Hwang, Wonyong Sung

PDF

1 Repo

TL;DR

This paper presents a real-time, character-level incremental speech recognition system using RNNs trained with CTC, capable of low-latency processing and OOV word dictation, with competitive accuracy on WSJ data.

Contribution

It introduces a novel online beam search with depth-pruning for low-latency, incremental speech recognition using RNNs and CTC, enabling real-time response and OOV word recognition.

Findings

01

Achieved 8.90% WER on WSJ Nov'92 set

02

Developed a low-latency online beam search algorithm

03

System can recognize out-of-vocabulary words based on pronunciation

Abstract

In real-time speech recognition applications, the latency is an important issue. We have developed a character-level incremental speech recognition (ISR) system that responds quickly even during the speech, where the hypotheses are gradually improved while the speaking proceeds. The algorithm employs a speech-to-character unidirectional recurrent neural network (RNN), which is end-to-end trained with connectionist temporal classification (CTC), and an RNN-based character-level language model (LM). The output values of the CTC-trained RNN are character-level probabilities, which are processed by beam search decoding. The RNN LM augments the decoding by providing long-term dependency information. We propose tree-based online beam search with additional depth-pruning, which enables the system to process infinitely long input speech with low latency. This system not only responds quickly on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nikhilrathaur/Handwriting-to-Digital-Text
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.