Integrated Multi-Level Knowledge Distillation for Enhanced Speaker Verification
Wenhao Yang, Jianguo Wei, Wenhuan Lu, Xugang Lu, Lei Li

TL;DR
This paper introduces IML-KD, a novel knowledge distillation method that transfers multi-scale temporal speech features from teacher to student models, significantly improving speaker verification accuracy.
Contribution
The paper proposes a new multi-level KD approach that incorporates temporal context and gradient-based representations to better capture speech dynamics.
Findings
IML-KD reduces EER by 5% on VoxCeleb1.
The method outperforms existing KD approaches in speaker verification.
Enhanced transfer of temporal speech features improves model performance.
Abstract
Knowledge distillation (KD) is widely used in audio tasks, such as speaker verification (SV), by transferring knowledge from a well-trained large model (the teacher) to a smaller, more compact model (the student) for efficiency and portability. Existing KD methods for SV often mirror those used in image processing, focusing on approximating predicted probabilities and hidden representations. However, these methods fail to account for the multi-level temporal properties of speech audio. In this paper, we propose a novel KD method, i.e., Integrated Multi-level Knowledge Distillation (IML-KD), to transfer knowledge of various temporal-scale features of speech from a teacher model to a student model. In the IML-KD, temporal context information from the teacher model is integrated into novel Integrated Gradient-based input-sensitive representations from speech segments with various…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing
MethodsKnowledge Distillation
