MSDA: Combining Pseudo-labeling and Self-Supervision for Unsupervised Domain Adaptation in ASR
Dimitrios Damianos, Georgios Paraskevopoulos, Alexandros Potamianos

TL;DR
This paper presents MSDA, a two-stage domain adaptation method combining self-supervision and semi-supervised learning to improve ASR robustness, especially for low-resource languages and noisy data, achieving state-of-the-art results.
Contribution
Introduces MSDA, a novel multi-stage domain adaptation pipeline that effectively combines self-supervised learning with pseudo-labeling for ASR.
Findings
MSDA significantly outperforms existing methods in low-resource and noisy scenarios.
The cascading approach enhances the effectiveness of combining self-supervision with self-training.
MSDA achieves state-of-the-art results on multiple ASR benchmarks.
Abstract
In this work, we investigate the Meta PL unsupervised domain adaptation framework for Automatic Speech Recognition (ASR). We introduce a Multi-Stage Domain Adaptation pipeline (MSDA), a sample-efficient, two-stage adaptation approach that integrates self-supervised learning with semi-supervised techniques. MSDA is designed to enhance the robustness and generalization of ASR models, making them more adaptable to diverse conditions. It is particularly effective for low-resource languages like Greek and in weakly supervised scenarios where labeled data is scarce or noisy. Through extensive experiments, we demonstrate that Meta PL can be applied effectively to ASR tasks, achieving state-of-the-art results, significantly outperforming state-of-the-art methods, and providing more robust solutions for unsupervised domain adaptation in ASR. Our ablations highlight the necessity of utilizing a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Speech Recognition and Synthesis
