Improving OOV Detection and Resolution with External Language Models in   Acoustic-to-Word ASR

Hirofumi Inaguma; Masato Mimura; Shinsuke Sakai; Tatsuya Kawahara

arXiv:1909.09993·cs.CL·September 27, 2019

Improving OOV Detection and Resolution with External Language Models in Acoustic-to-Word ASR

Hirofumi Inaguma, Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara

PDF

Open Access

TL;DR

This paper proposes using external language models combined with acoustic-to-character models to improve OOV detection and resolution in acoustic-to-word ASR systems, especially in out-of-domain scenarios.

Contribution

It introduces a novel approach that leverages external language models to enhance OOV detection and resolution in A2W ASR systems, demonstrating significant performance improvements.

Findings

01

External LMs reduce recognition errors and increase OOV detection.

02

The method improves performance in both English and Japanese corpora.

03

Vocabulary size can be reduced with minimal performance loss.

Abstract

Acoustic-to-word (A2W) end-to-end automatic speech recognition (ASR) systems have attracted attention because of an extremely simplified architecture and fast decoding. To alleviate data sparseness issues due to infrequent words, the combination with an acoustic-to-character (A2C) model is investigated. Moreover, the A2C model can be used to recover out-of-vocabulary (OOV) words that are not covered by the A2W model, but this requires accurate detection of OOV words. A2W models learn contexts with both acoustic and transcripts; therefore they tend to falsely recognize OOV words as words in the vocabulary. In this paper, we tackle this problem by using external language models (LM), which are trained only with transcriptions and have better linguistic information to detect OOV words. The A2C model is used to resolve these OOV words. Experimental evaluations show that external LMs have…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Music and Audio Processing

MethodsA2C