Leveraging Local and Global Knowledge Integration with Time-Frequency Calibrated Distillation for Speech Enhancement

Jiaming Cheng; Ruiyu Liang; Ye Ni; Chao Xu; Jing Li; Wei Zhou; Rui Liu; Bj\"orn W. Schuller; and Xiaoshuai Hao

arXiv:2506.13127·cs.SD·May 18, 2026

Leveraging Local and Global Knowledge Integration with Time-Frequency Calibrated Distillation for Speech Enhancement

Jiaming Cheng, Ruiyu Liang, Ye Ni, Chao Xu, Jing Li, Wei Zhou, Rui Liu, Bj\"orn W. Schuller, and Xiaoshuai Hao

PDF

1 Repo

TL;DR

This paper introduces a novel knowledge distillation framework for speech enhancement that leverages time-frequency information through recursive fusion and cross-calibration, improving low-complexity model performance.

Contribution

It proposes a new intra-set and inter-set recursive fusion framework with time-frequency calibrated distillation, exploiting speech's differential information for enhanced speech enhancement.

Findings

01

The proposed method outperforms other distillation schemes in objective evaluations.

02

It effectively improves low-complexity student model performance.

03

The framework is validated on both single-channel and multi-channel datasets.

Abstract

In this paper, we propose an intra-set and inter-set recursive fusion framework with time-frequency calibrated knowledge distillation (I $^{2}$ SRF-TFCKD) for SE. Different from previous distillation strategies for SE, the proposed framework fully exploits the time-frequency differential information of speech while facilitating both local information focusing and global knowledge circulation. Firstly, we construct a collaborative distillation paradigm for intra-set and inter-set correlations. Within a correlated set, multi-layer teacher-student features are pairwise matched for calibrated distillation. Subsequently, we generate representative features from each correlated set through recursive fusion to form the fused feature set that enables inter-set knowledge interaction. Secondly, we propose a multi-layer interactive distillation based on dual-stream time-frequency cross-calibration,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

JMCheng-SEU/I2S-TFCKD-SE
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.