Learning Hidden Unit Contributions for Unsupervised Acoustic Model Adaptation
Pawel Swietojanski, Jinyu Li, Steve Renals

TL;DR
This paper introduces LHUC, a method for unsupervised speaker and environment adaptation of neural network acoustic models, which improves speech recognition accuracy across diverse benchmarks without complex speaker-specific components.
Contribution
The paper extends LHUC to a speaker adaptive training framework, enabling more flexible adaptation without auxiliary feature extractors or structural changes.
Findings
LHUC achieves 5-23% relative WER reduction across benchmarks.
The method works with limited adaptation data and in one-shot scenarios.
LHUC complements other adaptation techniques effectively.
Abstract
This work presents a broad study on the adaptation of neural network acoustic models by means of learning hidden unit contributions (LHUC) -- a method that linearly re-combines hidden units in a speaker- or environment-dependent manner using small amounts of unsupervised adaptation data. We also extend LHUC to a speaker adaptive training (SAT) framework that leads to a more adaptable DNN acoustic model, working both in a speaker-dependent and a speaker-independent manner, without the requirements to maintain auxiliary speaker-dependent feature extractors or to introduce significant speaker-dependent changes to the DNN structure. Through a series of experiments on four different speech recognition benchmarks (TED talks, Switchboard, AMI meetings, and Aurora4) comprising 270 test speakers, we show that LHUC in both its test-only and SAT variants results in consistent word error rate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
