Two-Step Knowledge Distillation for Tiny Speech Enhancement
Rayan Daod Nathoo, Mikolaj Kegler, Marko Stamenovic

TL;DR
This paper introduces a two-step knowledge distillation method for tiny speech enhancement models, improving performance especially under high compression and low SNR conditions.
Contribution
It proposes a novel two-phase training process and a fine-grained similarity-preserving loss for better knowledge transfer in tiny speech enhancement models.
Findings
Achieves 0.9 dB SNR gain at -5 dB input SNR
Attains 1.1 dB SNR gain at 63x compression
Shows significant improvements in adverse conditions
Abstract
Tiny, causal models are crucial for embedded audio machine learning applications. Model compression can be achieved via distilling knowledge from a large teacher into a smaller student model. In this work, we propose a novel two-step approach for tiny speech enhancement model distillation. In contrast to the standard approach of a weighted mixture of distillation and supervised losses, we firstly pre-train the student using only the knowledge distillation (KD) objective, after which we switch to a fully supervised training regime. We also propose a novel fine-grained similarity-preserving KD loss, which aims to match the student's intra-activation Gram matrices to that of the teacher. Our method demonstrates broad improvements, but particularly shines in adverse conditions including high compression and low signal to noise ratios (SNR), yielding signal to distortion ratio gains of 0.9…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Advanced Adaptive Filtering Techniques · Acoustic Wave Phenomena Research
MethodsKnowledge Distillation
