Knowledge Distillation for Speech Denoising by Latent Representation   Alignment with Cosine Distance

Diep Luong; Mikko Heikkinen; Konstantinos Drossos; Tuomas Virtanen

arXiv:2505.03442·cs.SD·May 7, 2025

Knowledge Distillation for Speech Denoising by Latent Representation Alignment with Cosine Distance

Diep Luong, Mikko Heikkinen, Konstantinos Drossos, Tuomas Virtanen

PDF

Open Access

TL;DR

This paper introduces a novel knowledge distillation approach for speech denoising that leverages cosine similarity and autoencoder structures to improve student model performance, especially under mismatched conditions.

Contribution

It proposes a new KD method using cosine similarity and autoencoder principles to enhance speech denoising, addressing limitations of existing KD techniques.

Findings

01

The proposed method outperforms baseline in various mismatching scenarios.

02

Students retain more denoising capability under mismatched conditions.

03

The approach demonstrates improved performance over state-of-the-art methods.

Abstract

Speech denoising is a generally adopted and impactful task, appearing in many common and everyday-life use cases. Although there are very powerful methods published, most of those are too complex for deployment in everyday and low-resources computational environments, like hand-held devices, intelligent glasses, hearing aids, etc. Knowledge distillation (KD) is a prominent way for alleviating this complexity mismatch and is based on the transferring/distilling of knowledge from a pre-trained complex model, the teacher, to another less complex one, the student. Existing KD methods for speech denoising are based on processes that potentially hamper the KD by bounding the learning of the student to the distribution, information ordering, and feature dimensionality learned by the teacher. In this paper, we present and assess a method that tries to treat this issue, by exploiting the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing

MethodsKnowledge Distillation