TL;DR
This paper introduces a novel knowledge distillation framework for speech enhancement that leverages time-frequency information through recursive fusion and cross-calibration, improving low-complexity model performance.
Contribution
It proposes a new intra-set and inter-set recursive fusion framework with time-frequency calibrated distillation, exploiting speech's differential information for enhanced speech enhancement.
Findings
The proposed method outperforms other distillation schemes in objective evaluations.
It effectively improves low-complexity student model performance.
The framework is validated on both single-channel and multi-channel datasets.
Abstract
In this paper, we propose an intra-set and inter-set recursive fusion framework with time-frequency calibrated knowledge distillation (ISRF-TFCKD) for SE. Different from previous distillation strategies for SE, the proposed framework fully exploits the time-frequency differential information of speech while facilitating both local information focusing and global knowledge circulation. Firstly, we construct a collaborative distillation paradigm for intra-set and inter-set correlations. Within a correlated set, multi-layer teacher-student features are pairwise matched for calibrated distillation. Subsequently, we generate representative features from each correlated set through recursive fusion to form the fused feature set that enables inter-set knowledge interaction. Secondly, we propose a multi-layer interactive distillation based on dual-stream time-frequency cross-calibration,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
