Cross-domain Adaptation with Discrepancy Minimization for Text-independent Forensic Speaker Verification
Zhenyu Wang, Wei Xia, John H.L. Hansen

TL;DR
This paper introduces a cross-domain adaptation method for forensic speaker verification that aligns domain-specific distributions in the embedding space, improving performance across diverse acoustic environments.
Contribution
It proposes a novel discrepancy minimization approach using MMD for cross-domain adaptation in forensic speaker verification, leveraging a new multi-environment dataset.
Findings
Cross-domain adaptation improves speaker verification accuracy.
Diverse acoustic environments significantly impact performance.
Discrepancy minimization enhances generalization across domains.
Abstract
Forensic audio analysis for speaker verification offers unique challenges due to location/scenario uncertainty and diversity mismatch between reference and naturalistic field recordings. The lack of real naturalistic forensic audio corpora with ground-truth speaker identity represents a major challenge in this field. It is also difficult to directly employ small-scale domain-specific data to train complex neural network architectures due to domain mismatch and loss in performance. Alternatively, cross-domain speaker verification for multiple acoustic environments is a challenging task which could advance research in audio forensics. In this study, we introduce a CRSS-Forensics audio dataset collected in multiple acoustic environments. We pre-train a CNN-based network using the VoxCeleb data, followed by an approach which fine-tunes part of the high-level network layers with clean speech…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
