Differentiable Pooling for Unsupervised Acoustic Model Adaptation
Pawel Swietojanski, Steve Renals

TL;DR
This paper introduces differentiable pooling operators in deep neural network acoustic models to enable effective unsupervised speaker adaptation, leading to significant word error rate reductions across multiple speech recognition datasets.
Contribution
It proposes novel parametrised, differentiable pooling methods for acoustic model adaptation, demonstrating their robustness and effectiveness in unsupervised scenarios.
Findings
Word error rates reduced by 5-20% with proposed methods
Differentiable pooling provides low-dimensional, robust adaptation
Effective across diverse speech recognition corpora
Abstract
We present a deep neural network (DNN) acoustic model that includes parametrised and differentiable pooling operators. Unsupervised acoustic model adaptation is cast as the problem of updating the decision boundaries implemented by each pooling operator. In particular, we experiment with two types of pooling parametrisations: learned -norm pooling and weighted Gaussian pooling, in which the weights of both operators are treated as speaker-dependent. We perform investigations using three different large vocabulary speech recognition corpora: AMI meetings, TED talks and Switchboard conversational telephone speech. We demonstrate that differentiable pooling operators provide a robust and relatively low-dimensional way to adapt acoustic models, with relative word error rates reductions ranging from 5--20% with respect to unadapted systems, which themselves are better than the baseline…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
