TL;DR
This paper explores the use of CycleGANs for low-resource domain adaptation in speaker recognition, demonstrating improvements in challenging scenarios like reverberant to clean speech with limited target data.
Contribution
It introduces the application of CycleGAN-based unsupervised domain adaptation in low-resource settings for speaker recognition, including novel reverberant to clean adaptation.
Findings
Achieved 18.3% relative EER reduction on VOiCES dataset
Improved over state-of-the-art WPE de-reverberation
Effective even with limited target domain data
Abstract
Current speaker recognition technology provides great performance with the x-vector approach. However, performance decreases when the evaluation domain is different from the training domain, an issue usually addressed with domain adaptation approaches. Recently, unsupervised domain adaptation using cycle-consistent Generative Adversarial Netorks (CycleGAN) has received a lot of attention. CycleGAN learn mappings between features of two domains given non-parallel data. We investigate their effectiveness in low resource scenario i.e. when limited amount of target domain data is available for adaptation, a case unexplored in previous works. We experiment with two adaptation tasks: microphone to telephone and a novel reverberant to clean adaptation with the end goal of improving speaker recognition performance. Number of speakers present in source and target domains are 7000 and 191…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
