Deep Feature CycleGANs: Speaker Identity Preserving Non-parallel Microphone-Telephone Domain Adaptation for Speaker Verification
Saurabh Kataria, Jes\'us Villalba, Piotr \.Zelasko, Laureano, Moro-Vel\'azquez, Najim Dehak

TL;DR
This paper introduces a CycleGAN-based method for microphone-to-telephone domain adaptation in speaker verification, using deep feature space modifications to preserve speaker identity and improve verification accuracy.
Contribution
It proposes a novel deep feature space CycleGAN approach with task-specific loss modifications for speaker identity preservation in domain adaptation.
Findings
Achieved 5-10% relative EER improvement.
Demonstrated effectiveness on challenging real data.
Analyzed hyper-parameter sensitivity and introduced adaptation probability.
Abstract
With the increase in the availability of speech from varied domains, it is imperative to use such out-of-domain data to improve existing speech systems. Domain adaptation is a prominent pre-processing approach for this. We investigate it for adapt microphone speech to the telephone domain. Specifically, we explore CycleGAN-based unpaired translation of microphone data to improve the x-vector/speaker embedding network for Telephony Speaker Verification. We first demonstrate the efficacy of this on real challenging data and then, to improve further, we modify the CycleGAN formulation to make the adaptation task-specific. We modify CycleGAN's identity loss, cycle-consistency loss, and adversarial loss to operate in the deep feature space. Deep features of a signal are extracted from an auxiliary (speaker embedding) network and, hence, preserves speaker identity. Our 3D convolution-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsBatch Normalization · Residual Connection · GAN Least Squares Loss · Sigmoid Activation · Convolution · HuMan(Expedia)||How do I get a human at Expedia? · Cycle Consistency Loss · PatchGAN · Residual Block · Instance Normalization
