Multi-source Domain Adaptation for Text-independent Forensic Speaker Recognition
Zhenyu Wang, and John H. L. Hansen

TL;DR
This paper introduces three novel multi-source domain adaptation methods for forensic speaker recognition, addressing challenges of diverse acoustic environments and improving performance across multiple domains.
Contribution
It proposes domain adversarial training, discrepancy minimization, and moment-matching approaches for effective multi-domain adaptation in forensic speaker recognition.
Findings
Diverse acoustic environments impact recognition performance.
Domain adversarial training learns domain-invariant features.
Discrepancy minimization improves multi-domain performance.
Abstract
Adapting speaker recognition systems to new environments is a widely-used technique to improve a well-performing model learned from large-scale data towards a task-specific small-scale data scenarios. However, previous studies focus on single domain adaptation, which neglects a more practical scenario where training data are collected from multiple acoustic domains needed in forensic scenarios. Audio analysis for forensic speaker recognition offers unique challenges in model training with multi-domain training data due to location/scenario uncertainty and diversity mismatch between reference and naturalistic field recordings. It is also difficult to directly employ small-scale domain-specific data to train complex neural network architectures due to domain mismatch and performance loss. Fine-tuning is a commonly-used method for adaptation in order to retrain the model with weights…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
