Latent Dirichlet Allocation Based Acoustic Data Selection for Automatic   Speech Recognition

Mortaza (Morrie) Doulaty; Thomas Hain

arXiv:1907.01302·cs.CL·July 3, 2019·1 cites

Latent Dirichlet Allocation Based Acoustic Data Selection for Automatic Speech Recognition

Mortaza (Morrie) Doulaty, Thomas Hain

PDF

Open Access

TL;DR

This paper introduces a novel data selection method using Acoustic Latent Dirichlet Allocation (aLDA) to improve automatic speech recognition by choosing the most relevant training data from large, diverse datasets.

Contribution

It proposes aLDA as a data similarity criterion for selecting in-domain training data, significantly enhancing speech recognition performance over existing methods.

Findings

01

aLDA-based selection outperforms random and posterior-based methods

02

Selected data improves speech recognition accuracy

03

Method effectively handles large, diverse datasets

Abstract

Selecting in-domain data from a large pool of diverse and out-of-domain data is a non-trivial problem. In most cases simply using all of the available data will lead to sub-optimal and in some cases even worse performance compared to carefully selecting a matching set. This is true even for data-inefficient neural models. Acoustic Latent Dirichlet Allocation (aLDA) is shown to be useful in a variety of speech technology related tasks, including domain adaptation of acoustic models for automatic speech recognition and entity labeling for information retrieval. In this paper we propose to use aLDA as a data similarity criterion in a data selection framework. Given a large pool of out-of-domain and potentially mismatched data, the task is to select the best-matching training data to a set of representative utterances sampled from a target domain. Our target data consists of around 32 hours…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Natural Language Processing Techniques