Lattice-Based Unsupervised Test-Time Adaptation of Neural Network   Acoustic Models

Ondrej Klejch; Joachim Fainberg; Peter Bell; Steve Renals

arXiv:1906.11521·cs.CL·June 28, 2019·6 cites

Lattice-Based Unsupervised Test-Time Adaptation of Neural Network Acoustic Models

Ondrej Klejch, Joachim Fainberg, Peter Bell, Steve Renals

PDF

Open Access

TL;DR

This paper introduces a lattice-based unsupervised test-time adaptation method for neural network acoustic models that improves adaptation robustness, especially in high-error transcription scenarios, by leveraging lattice information within the LF-MMI framework.

Contribution

It proposes a novel lattice-based discriminative adaptation approach that allows more parameters to be adapted without over-fitting, even with high WER initial transcriptions.

Findings

01

Effective adaptation on diverse transcription tasks including TED talks, MGB, and Somali.

02

Enables adaptation with initial WER over 50%.

03

Improves robustness of acoustic models to unseen test conditions.

Abstract

Acoustic model adaptation to unseen test recordings aims to reduce the mismatch between training and testing conditions. Most adaptation schemes for neural network models require the use of an initial one-best transcription for the test data, generated by an unadapted model, in order to estimate the adaptation transform. It has been found that adaptation methods using discriminative objective functions - such as cross-entropy loss - often require careful regularisation to avoid over-fitting to errors in the one-best transcriptions. In this paper we solve this problem by performing discriminative adaptation using lattices obtained from a first pass decoding, an approach that can be readily integrated into the lattice-free maximum mutual information (LF-MMI) framework. We investigate this approach on three transcription tasks of varying difficulty: TED talks, multi-genre broadcast (MGB)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Natural Language Processing Techniques