Integrated Multi-Level Knowledge Distillation for Enhanced Speaker   Verification

Wenhao Yang; Jianguo Wei; Wenhuan Lu; Xugang Lu; Lei Li

arXiv:2409.09389·eess.AS·September 17, 2024

Integrated Multi-Level Knowledge Distillation for Enhanced Speaker Verification

Wenhao Yang, Jianguo Wei, Wenhuan Lu, Xugang Lu, Lei Li

PDF

Open Access

TL;DR

This paper introduces IML-KD, a novel knowledge distillation method that transfers multi-scale temporal speech features from teacher to student models, significantly improving speaker verification accuracy.

Contribution

The paper proposes a new multi-level KD approach that incorporates temporal context and gradient-based representations to better capture speech dynamics.

Findings

01

IML-KD reduces EER by 5% on VoxCeleb1.

02

The method outperforms existing KD approaches in speaker verification.

03

Enhanced transfer of temporal speech features improves model performance.

Abstract

Knowledge distillation (KD) is widely used in audio tasks, such as speaker verification (SV), by transferring knowledge from a well-trained large model (the teacher) to a smaller, more compact model (the student) for efficiency and portability. Existing KD methods for SV often mirror those used in image processing, focusing on approximating predicted probabilities and hidden representations. However, these methods fail to account for the multi-level temporal properties of speech audio. In this paper, we propose a novel KD method, i.e., Integrated Multi-level Knowledge Distillation (IML-KD), to transfer knowledge of various temporal-scale features of speech from a teacher model to a student model. In the IML-KD, temporal context information from the teacher model is integrated into novel Integrated Gradient-based input-sensitive representations from speech segments with various…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing

MethodsKnowledge Distillation