SLICE: Speech Enhancement via Layer-wise Injection of Conditioning Embeddings
Seokhoon Moon, Kyudan Jung, Jaegul Choo

TL;DR
This paper introduces a layer-wise injection method for conditioning in diffusion-based speech enhancement models, improving performance on complex, real-world degradations by propagating conditioning information through all residual blocks.
Contribution
It proposes a novel layer-wise injection of degradation conditioning into diffusion models, outperforming input-level conditioning and enhancing robustness to real-world speech corruptions.
Findings
Layer-wise injection outperforms input-level conditioning.
The method generalizes well to real-world recordings.
It improves speech enhancement under compound degradations.
Abstract
Real-world speech is often corrupted by multiple degradations simultaneously, including additive noise, reverberation, and nonlinear distortion. Diffusion-based enhancement methods perform well on single degradations but struggle with compound corruptions. Prior noise-aware approaches inject conditioning at the input layer only, which can degrade performance below that of an unconditioned model. To address this, we propose injecting degradation conditioning, derived from a pretrained encoder with multi-task heads for noise type, reverberation, and distortion, into the timestep embedding so that it propagates through all residual blocks without architectural changes. In controlled experiments where only the injection method varies, input-level conditioning performs worse than no encoder at all on compound degradations, while layer-wise injection achieves the best results. The method also…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Hearing Loss and Rehabilitation · Advanced Adaptive Filtering Techniques
