Unsupervised Domain Adaptation for Robust Speech Recognition via Variational Autoencoder-Based Data Augmentation
Wei-Ning Hsu, Yu Zhang, James Glass

TL;DR
This paper introduces an unsupervised domain adaptation method for speech recognition using variational autoencoders to augment training data by transforming nuisance attributes, significantly improving robustness across domains.
Contribution
It proposes a novel VAE-based data augmentation technique that adapts speech models to new domains without requiring target domain transcripts.
Findings
Reduced WER by up to 35% on CHiME-4 dataset
Effective domain adaptation without target transcripts
Improved robustness in real-world speech recognition
Abstract
Domain mismatch between training and testing can lead to significant degradation in performance in many machine learning scenarios. Unfortunately, this is not a rare situation for automatic speech recognition deployments in real-world applications. Research on robust speech recognition can be regarded as trying to overcome this domain mismatch issue. In this paper, we address the unsupervised domain adaptation problem for robust speech recognition, where both source and target domain speech are presented, but word transcripts are only available for the source domain speech. We present novel augmentation-based methods that transform speech in a way that does not change the transcripts. Specifically, we first train a variational autoencoder on both source and target domain data (without supervision) to learn a latent representation of speech. We then transform nuisance attributes of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSolana Customer Service Number +1-833-534-1729
