Universal Robust Speech Adaptation for Cross-Domain Speech Recognition and Enhancement
Chien-Chun Wang, Hung-Shin Lee, Hsin-Min Wang, Berlin Chen

TL;DR
URSA-GAN is a unified generative framework that enhances cross-domain speech recognition and enhancement by using dual-embeddings and stochastic perturbation to improve robustness against unseen noise and channel distortions.
Contribution
This paper introduces URSA-GAN, a novel domain-aware GAN-based model with dual-embeddings and stochastic perturbation for robust speech adaptation across diverse conditions.
Findings
Reduces character error rates in ASR by 16.16%.
Improves perceptual metrics in speech enhancement by 15.58%.
Demonstrates strong generalization to unseen noise and channel conditions.
Abstract
Pre-trained models for automatic speech recognition (ASR) and speech enhancement (SE) have exhibited remarkable capabilities under matched noise and channel conditions. However, these models often suffer from severe performance degradation when confronted with domain shifts, particularly in the presence of unseen noise and channel distortions. In view of this, we in this paper present URSA-GAN, a unified and domain-aware generative framework specifically designed to mitigate mismatches in both noise and channel conditions. URSA-GAN leverages a dual-embedding architecture that consists of a noise encoder and a channel encoder, each pre-trained with limited in-domain data to capture domain-relevant representations. These embeddings condition a GAN-based speech generator, facilitating the synthesis of speech that is acoustically aligned with the target domain while preserving phonetic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Face recognition and analysis
