Subword RNNLM Approximations for Out-Of-Vocabulary Keyword Search

Mittul Singh; Sami Virpioja; Peter Smit; Mikko Kurimo

arXiv:2005.13827·cs.CL·September 11, 2020

Subword RNNLM Approximations for Out-Of-Vocabulary Keyword Search

Mittul Singh, Sami Virpioja, Peter Smit, Mikko Kurimo

PDF

1 Repo

TL;DR

This paper introduces a novel RNNLM approximation method for subword units that improves out-of-vocabulary keyword search by better handling data sparsity and long-span dependencies.

Contribution

It proposes a new RNNLM approximation technique that produces variable-order n-grams, enhancing OOV recognition in spoken keyword search systems.

Findings

01

Interpolating RNNLM approximation with conventional models improves OOV recognition.

02

The new approximation method outperforms baseline models on Arabic and Finnish keyword search tasks.

03

Enhanced models achieve higher maximum term weighted value for subword units.

Abstract

In spoken Keyword Search, the query may contain out-of-vocabulary (OOV) words not observed when training the speech recognition system. Using subword language models (LMs) in the first-pass recognition makes it possible to recognize the OOV words, but even the subword n-gram LMs suffer from data sparsity. Recurrent Neural Network (RNN) LMs alleviate the sparsity problems but are not suitable for first-pass recognition as such. One way to solve this is to approximate the RNNLMs by back-off n-gram models. In this paper, we propose to interpolate the conventional n-gram models and the RNNLM approximation for better OOV recognition. Furthermore, we develop a new RNNLM approximation method suitable for subword units: It produces variable-order n-grams to include long-span approximations and considers also n-grams that were not originally observed in the training corpus. To evaluate these…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lallubharteja/KWS-Scripts
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.