Distil-DCCRN: A Small-footprint DCCRN Leveraging Feature-based Knowledge   Distillation in Speech Enhancement

Runduo Han; Weiming Xu; Zihan Zhang; Mingshuai Liu; Lei Xie

arXiv:2408.04267·cs.SD·August 9, 2024

Distil-DCCRN: A Small-footprint DCCRN Leveraging Feature-based Knowledge Distillation in Speech Enhancement

Runduo Han, Weiming Xu, Zihan Zhang, Mingshuai Liu, Lei Xie

PDF

Open Access

TL;DR

This paper introduces Distil-DCCRN, a compact speech enhancement model that uses a novel knowledge distillation method with attention transfer and KL divergence, achieving performance comparable to larger models.

Contribution

The paper presents a new knowledge distillation approach that transfers both output and intermediate features from a complex teacher model to a smaller student model in speech enhancement.

Findings

01

Distil-DCCRN has only 30% of DCCRN's parameters.

02

Distil-DCCRN outperforms DCCRN in PESQ and SI-SNR metrics.

03

The method achieves comparable DNSMOS results to DCCRN.

Abstract

The deep complex convolution recurrent network (DCCRN) achieves excellent speech enhancement performance by utilizing the audio spectrum's complex features. However, it has a large number of model parameters. We propose a smaller model, Distil-DCCRN, which has only 30% of the parameters compared to the DCCRN. To ensure that the performance of Distil-DCCRN matches that of the DCCRN, we employ the knowledge distillation (KD) method to use a larger teacher model to help train a smaller student model. We design a knowledge distillation (KD) method, integrating attention transfer and Kullback-Leibler divergence (AT-KL) to train the student model Distil-DCCRN. Additionally, we use a model with better performance and a more complicated structure, Uformer, as the teacher model. Unlike previous KD approaches that mainly focus on model outputs, our method also leverages the intermediate features…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing