Differentiable Pooling for Unsupervised Acoustic Model Adaptation

Pawel Swietojanski; Steve Renals

arXiv:1603.09630·cs.CL·July 14, 2016

Differentiable Pooling for Unsupervised Acoustic Model Adaptation

Pawel Swietojanski, Steve Renals

PDF

TL;DR

This paper introduces differentiable pooling operators in deep neural network acoustic models to enable effective unsupervised speaker adaptation, leading to significant word error rate reductions across multiple speech recognition datasets.

Contribution

It proposes novel parametrised, differentiable pooling methods for acoustic model adaptation, demonstrating their robustness and effectiveness in unsupervised scenarios.

Findings

01

Word error rates reduced by 5-20% with proposed methods

02

Differentiable pooling provides low-dimensional, robust adaptation

03

Effective across diverse speech recognition corpora

Abstract

We present a deep neural network (DNN) acoustic model that includes parametrised and differentiable pooling operators. Unsupervised acoustic model adaptation is cast as the problem of updating the decision boundaries implemented by each pooling operator. In particular, we experiment with two types of pooling parametrisations: learned $L_{p}$ -norm pooling and weighted Gaussian pooling, in which the weights of both operators are treated as speaker-dependent. We perform investigations using three different large vocabulary speech recognition corpora: AMI meetings, TED talks and Switchboard conversational telephone speech. We demonstrate that differentiable pooling operators provide a robust and relatively low-dimensional way to adapt acoustic models, with relative word error rates reductions ranging from 5--20% with respect to unadapted systems, which themselves are better than the baseline…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.