Comparison of Knowledge Distillation Methods for Low-complexity Multi-microphone Speech Enhancement using the FT-JNF Architecture

Robert Metzger; Mattes Ohlenbusch; Christian Rollwage; Simon Doclo

arXiv:2507.19208·eess.AS·July 28, 2025

Comparison of Knowledge Distillation Methods for Low-complexity Multi-microphone Speech Enhancement using the FT-JNF Architecture

Robert Metzger, Mattes Ohlenbusch, Christian Rollwage, Simon Doclo

PDF

Open Access

TL;DR

This paper evaluates knowledge distillation techniques to create low-complexity, high-performance multi-microphone speech enhancement models based on the FT-JNF architecture, enabling deployment on resource-constrained devices.

Contribution

It systematically compares five KD methods for the FT-JNF architecture, demonstrating significant size reduction with minimal performance loss in speech enhancement.

Findings

01

Three KD methods improve student model performance over baseline.

02

A student model with 25% of teacher parameters achieves comparable PESQ scores.

03

Model size can be reduced by up to 96% with minimal PESQ score decrease.

Abstract

Multi-microphone speech enhancement using deep neural networks (DNNs) has significantly progressed in recent years. However, many proposed DNN-based speech enhancement algorithms cannot be implemented on devices with limited hardware resources. Only lowering the complexity of such systems by reducing the number of parameters often results in worse performance. Knowledge Distillation (KD) is a promising approach for reducing DNN model size while preserving performance. In this paper, we consider the recently proposed Frequency-Time Joint Non-linear Filter (FT-JNF) architecture and investigate several KD methods to train smaller (student) models from a large pre-trained (teacher) model. Five KD methods are evaluated using direct output matching, the self-similarity of intermediate layers, and fused multi-layer losses. Experimental results on a simulated dataset using a compact array with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Advanced Adaptive Filtering Techniques · Hearing Loss and Rehabilitation