Acoustic-To-Word Model Without OOV

Jinyu Li; Guoli Ye; Rui Zhao; Jasha Droppo; Yifan Gong

arXiv:1711.10136·cs.CL·November 29, 2017·1 cites

Acoustic-To-Word Model Without OOV

Jinyu Li, Guoli Ye, Rui Zhao, Jasha Droppo, Yifan Gong

PDF

Open Access

TL;DR

This paper presents a hybrid CTC model that predicts both words and characters to address the out-of-vocabulary issue in acoustic-to-word speech recognition, improving accuracy on a voice assistant task.

Contribution

The study introduces a hybrid CTC model with synchronized word and character outputs that effectively handles OOV and hot-words in end-to-end speech recognition.

Findings

01

Reduces OOV-related errors by 30% on Microsoft Cortana task.

02

Enables recognition of hot-words emerging after training.

03

Improves end-to-end speech recognition accuracy.

Abstract

Recently, the acoustic-to-word model based on the Connectionist Temporal Classification (CTC) criterion was shown as a natural end-to-end model directly targeting words as output units. However, this type of word-based CTC model suffers from the out-of-vocabulary (OOV) issue as it can only model limited number of words in the output layer and maps all the remaining words into an OOV output node. Therefore, such word-based CTC model can only recognize the frequent words modeled by the network output nodes. It also cannot easily handle the hot-words which emerge after the model is trained. In this study, we improve the acoustic-to-word model with a hybrid CTC model which can predict both words and characters at the same time. With a shared-hidden-layer structure and modular design, the alignments of words generated from the word-based CTC and the character-based CTC are synchronized.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Topic Modeling · Natural Language Processing Techniques