Unsupervised Adaptation with Interpretable Disentangled Representations   for Distant Conversational Speech Recognition

Wei-Ning Hsu; Hao Tang; James Glass

arXiv:1806.04872·cs.CL·June 14, 2018

Unsupervised Adaptation with Interpretable Disentangled Representations for Distant Conversational Speech Recognition

Wei-Ning Hsu, Hao Tang, James Glass

PDF

TL;DR

This paper introduces an unsupervised adaptation technique for speech recognition that synthesizes labeled data from unlabeled in-domain speech by disentangling linguistic and nuisance factors, significantly improving performance in distant conversational speech scenarios.

Contribution

It presents a novel method to learn interpretable speech representations and adapt models without labeled in-domain data, addressing domain mismatch in speech recognition.

Findings

01

Outperforms all baselines on the AMI dataset

02

Bridges over 77% of the gap between unadapted and in-domain models

03

Effectively handles channel mismatch in conversational speech

Abstract

The current trend in automatic speech recognition is to leverage large amounts of labeled data to train supervised neural network models. Unfortunately, obtaining data for a wide range of domains to train robust models can be costly. However, it is relatively inexpensive to collect large amounts of unlabeled data from domains that we want the models to generalize to. In this paper, we propose a novel unsupervised adaptation method that learns to synthesize labeled data for the target domain from unlabeled in-domain data and labeled out-of-domain data. We first learn without supervision an interpretable latent representation of speech that encodes linguistic and nuisance factors (e.g., speaker and channel) using different latent variables. To transform a labeled out-of-domain utterance without altering its transcript, we transform the latent nuisance variables while maintaining the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.