Cross-domain Adaptation with Discrepancy Minimization for   Text-independent Forensic Speaker Verification

Zhenyu Wang; Wei Xia; John H.L. Hansen

arXiv:2009.02444·eess.AS·September 10, 2020

Cross-domain Adaptation with Discrepancy Minimization for Text-independent Forensic Speaker Verification

Zhenyu Wang, Wei Xia, John H.L. Hansen

PDF

Open Access

TL;DR

This paper introduces a cross-domain adaptation method for forensic speaker verification that aligns domain-specific distributions in the embedding space, improving performance across diverse acoustic environments.

Contribution

It proposes a novel discrepancy minimization approach using MMD for cross-domain adaptation in forensic speaker verification, leveraging a new multi-environment dataset.

Findings

01

Cross-domain adaptation improves speaker verification accuracy.

02

Diverse acoustic environments significantly impact performance.

03

Discrepancy minimization enhances generalization across domains.

Abstract

Forensic audio analysis for speaker verification offers unique challenges due to location/scenario uncertainty and diversity mismatch between reference and naturalistic field recordings. The lack of real naturalistic forensic audio corpora with ground-truth speaker identity represents a major challenge in this field. It is also difficult to directly employ small-scale domain-specific data to train complex neural network architectures due to domain mismatch and loss in performance. Alternatively, cross-domain speaker verification for multiple acoustic environments is a challenging task which could advance research in audio forensics. In this study, we introduce a CRSS-Forensics audio dataset collected in multiple acoustic environments. We pre-train a CNN-based network using the VoxCeleb data, followed by an approach which fine-tunes part of the high-level network layers with clean speech…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing