Open-set Short Utterance Forensic Speaker Verification using   Teacher-Student Network with Explicit Inductive Bias

Mufan Sang; Wei Xia; John H.L. Hansen

arXiv:2009.09556·eess.AS·September 22, 2020·1 cites

Open-set Short Utterance Forensic Speaker Verification using Teacher-Student Network with Explicit Inductive Bias

Mufan Sang, Wei Xia, John H.L. Hansen

PDF

Open Access

TL;DR

This paper introduces a teacher-student network approach with explicit inductive bias to enhance forensic speaker verification on small, challenging datasets of short utterances recorded in uncontrolled environments.

Contribution

It proposes a novel knowledge distillation objective function and a fine-tuning strategy to improve speaker verification accuracy in forensic scenarios with limited data.

Findings

01

The proposed objective function improves teacher-student learning on short utterances.

02

Fine-tuning with the new strategy outperforms weight decay in domain adaptation.

03

The approach achieves better verification performance on the forensic corpus.

Abstract

In forensic applications, it is very common that only small naturalistic datasets consisting of short utterances in complex or unknown acoustic environments are available. In this study, we propose a pipeline solution to improve speaker verification on a small actual forensic field dataset. By leveraging large-scale out-of-domain datasets, a knowledge distillation based objective function is proposed for teacher-student learning, which is applied for short utterance forensic speaker verification. The objective function collectively considers speaker classification loss, Kullback-Leibler divergence, and similarity of embeddings. In order to advance the trained deep speaker embedding network to be robust for a small target dataset, we introduce a novel strategy to fine-tune the pre-trained student model towards a forensic target domain by utilizing the model as a finetuning start point…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing

MethodsKnowledge Distillation · Weight Decay