Multi-Level Transfer Learning from Near-Field to Far-Field Speaker   Verification

Li Zhang; Qing Wang; Kong Aik Lee; Lei Xie; Haizhou Li

arXiv:2106.09320·cs.SD·June 18, 2021

Multi-Level Transfer Learning from Near-Field to Far-Field Speaker Verification

Li Zhang, Qing Wang, Kong Aik Lee, Lei Xie, Haizhou Li

PDF

Open Access

TL;DR

This paper introduces a transfer learning approach for far-field speaker verification that employs feature-level and instance-level knowledge transfer within a teacher-student framework to improve domain-invariant embeddings, significantly reducing error rates.

Contribution

It proposes novel contrastive and pairwise distance transfer methods for domain adaptation in speaker verification, outperforming existing approaches.

Findings

01

13.9% reduction in EER on Full-eval trials

02

6.3% reduction in minDCF compared to winner's DenseNet

03

Performance close to fusion systems on challenging trials

Abstract

In far-field speaker verification, the performance of speaker embeddings is susceptible to degradation when there is a mismatch between the conditions of enrollment and test speech. To solve this problem, we propose the feature-level and instance-level transfer learning in the teacher-student framework to learn a domain-invariant embedding space. For the feature-level knowledge transfer, we develop the contrastive loss to transfer knowledge from teacher model to student model, which can not only decrease the intra-class distance, but also enlarge the inter-class distance. Moreover, we propose the instance-level pairwise distance transfer method to force the student model to preserve pairwise instances distance from the well optimized embedding space of the teacher model. On FFSVC 2020 evaluation set, our EER on Full-eval trials is relatively reduced by 13.9% compared with the fusion…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Voice and Speech Disorders