Multi-Level Transfer Learning from Near-Field to Far-Field Speaker Verification
Li Zhang, Qing Wang, Kong Aik Lee, Lei Xie, Haizhou Li

TL;DR
This paper introduces a transfer learning approach for far-field speaker verification that employs feature-level and instance-level knowledge transfer within a teacher-student framework to improve domain-invariant embeddings, significantly reducing error rates.
Contribution
It proposes novel contrastive and pairwise distance transfer methods for domain adaptation in speaker verification, outperforming existing approaches.
Findings
13.9% reduction in EER on Full-eval trials
6.3% reduction in minDCF compared to winner's DenseNet
Performance close to fusion systems on challenging trials
Abstract
In far-field speaker verification, the performance of speaker embeddings is susceptible to degradation when there is a mismatch between the conditions of enrollment and test speech. To solve this problem, we propose the feature-level and instance-level transfer learning in the teacher-student framework to learn a domain-invariant embedding space. For the feature-level knowledge transfer, we develop the contrastive loss to transfer knowledge from teacher model to student model, which can not only decrease the intra-class distance, but also enlarge the inter-class distance. Moreover, we propose the instance-level pairwise distance transfer method to force the student model to preserve pairwise instances distance from the well optimized embedding space of the teacher model. On FFSVC 2020 evaluation set, our EER on Full-eval trials is relatively reduced by 13.9% compared with the fusion…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Voice and Speech Disorders
