Learning to retrieve out-of-vocabulary words in speech recognition

Imran Sheikh; Irina Illina; Dominique Fohr; Georges Linar\`es

arXiv:1511.05389·cs.CL·March 2, 2016·5 cites

Learning to retrieve out-of-vocabulary words in speech recognition

Imran Sheikh, Irina Illina, Dominique Fohr, Georges Linar\`es

PDF

Open Access

TL;DR

This paper introduces two neural network models, D-CBOW and D-CBOW2, designed to retrieve relevant out-of-vocabulary proper names in speech recognition by leveraging semantic context, improving over traditional embedding methods.

Contribution

The paper proposes novel neural network models with a context anchor layer for effective retrieval of OOV proper names in speech recognition tasks.

Findings

01

Both models outperform baseline embedding methods on French broadcast news.

02

D-CBOW2's context anchor enhances key-word importance detection.

03

Combining models accelerates training convergence.

Abstract

Many Proper Names (PNs) are Out-Of-Vocabulary (OOV) words for speech recognition systems used to process diachronic audio data. To help recovery of the PNs missed by the system, relevant OOV PNs can be retrieved out of the many OOVs by exploiting semantic context of the spoken content. In this paper, we propose two neural network models targeted to retrieve OOV PNs relevant to an audio document: (a) Document level Continuous Bag of Words (D-CBOW), (b) Document level Continuous Bag of Weighted Words (D-CBOW2). Both these models take document words as input and learn with an objective to maximise the retrieval of co-occurring OOV PNs. With the D-CBOW2 model we propose a new approach in which the input embedding layer is augmented with a context anchor layer. This layer learns to assign importance to input words and has the ability to capture (task specific) key-words in a bag-of-word…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Music and Audio Processing

MethodsLinear Discriminant Analysis