Lattice-Based Unsupervised Test-Time Adaptation of Neural Network Acoustic Models
Ondrej Klejch, Joachim Fainberg, Peter Bell, Steve Renals

TL;DR
This paper introduces a lattice-based unsupervised test-time adaptation method for neural network acoustic models that improves adaptation robustness, especially in high-error transcription scenarios, by leveraging lattice information within the LF-MMI framework.
Contribution
It proposes a novel lattice-based discriminative adaptation approach that allows more parameters to be adapted without over-fitting, even with high WER initial transcriptions.
Findings
Effective adaptation on diverse transcription tasks including TED talks, MGB, and Somali.
Enables adaptation with initial WER over 50%.
Improves robustness of acoustic models to unseen test conditions.
Abstract
Acoustic model adaptation to unseen test recordings aims to reduce the mismatch between training and testing conditions. Most adaptation schemes for neural network models require the use of an initial one-best transcription for the test data, generated by an unadapted model, in order to estimate the adaptation transform. It has been found that adaptation methods using discriminative objective functions - such as cross-entropy loss - often require careful regularisation to avoid over-fitting to errors in the one-best transcriptions. In this paper we solve this problem by performing discriminative adaptation using lattices obtained from a first pass decoding, an approach that can be readily integrated into the lattice-free maximum mutual information (LF-MMI) framework. We investigate this approach on three transcription tasks of varying difficulty: TED talks, multi-genre broadcast (MGB)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Natural Language Processing Techniques
