Joint domain adaptation and speech bandwidth extension using time-domain GANs for speaker verification
Saurabh Kataria, Jes\'us Villalba, Laureano Moro-Vel\'azquez, Najim, Dehak

TL;DR
This paper introduces a joint learning approach using time-domain GANs to simultaneously perform domain adaptation and bandwidth extension for speech, improving speaker verification accuracy across different acoustic conditions.
Contribution
It proposes a novel joint training framework for domain adaptation and bandwidth extension using GANs, with both parallel and non-parallel data, outperforming separate models.
Findings
22% relative EER improvement on SRE16
Effective use of both paired and unpaired data
First evidence that joint learning surpasses individual tasks
Abstract
Speech systems developed for a particular choice of acoustic domain and sampling frequency do not translate easily to others. The usual practice is to learn domain adaptation and bandwidth extension models independently. Contrary to this, we propose to learn both tasks together. Particularly, we learn to map narrowband conversational telephone speech to wideband microphone speech. We developed parallel and non-parallel learning solutions which utilize both paired and unpaired data. First, we first discuss joint and disjoint training of multiple generative models for our tasks. Then, we propose a two-stage learning solution where we use a pre-trained domain adaptation system for pre-processing in bandwidth extension training. We evaluated our schemes on a Speaker Verification downstream task. We used the JHU-MIT experimental setup for NIST SRE21, which comprises SRE16, SRE-CTS Superset…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
