Open-set Short Utterance Forensic Speaker Verification using Teacher-Student Network with Explicit Inductive Bias
Mufan Sang, Wei Xia, John H.L. Hansen

TL;DR
This paper introduces a teacher-student network approach with explicit inductive bias to enhance forensic speaker verification on small, challenging datasets of short utterances recorded in uncontrolled environments.
Contribution
It proposes a novel knowledge distillation objective function and a fine-tuning strategy to improve speaker verification accuracy in forensic scenarios with limited data.
Findings
The proposed objective function improves teacher-student learning on short utterances.
Fine-tuning with the new strategy outperforms weight decay in domain adaptation.
The approach achieves better verification performance on the forensic corpus.
Abstract
In forensic applications, it is very common that only small naturalistic datasets consisting of short utterances in complex or unknown acoustic environments are available. In this study, we propose a pipeline solution to improve speaker verification on a small actual forensic field dataset. By leveraging large-scale out-of-domain datasets, a knowledge distillation based objective function is proposed for teacher-student learning, which is applied for short utterance forensic speaker verification. The objective function collectively considers speaker classification loss, Kullback-Leibler divergence, and similarity of embeddings. In order to advance the trained deep speaker embedding network to be robust for a small target dataset, we introduce a novel strategy to fine-tune the pre-trained student model towards a forensic target domain by utilizing the model as a finetuning start point…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
MethodsKnowledge Distillation · Weight Decay
