From Diet to Free Lunch: Estimating Auxiliary Signal Properties using Dynamic Pruning Masks in Speech Enhancement Networks
Riccardo Miccini, Cl\'ement Laroche, Tobias Piechowiak, Xenofon Fafoutis, Luca Pezzarossa

TL;DR
This paper explores how dynamic pruning masks in speech enhancement networks can be used to estimate auxiliary signal properties like VAD and noise classification, eliminating the need for separate models.
Contribution
It demonstrates that internal pruning masks can accurately predict auxiliary speech properties, repurposing dynamic pruning for both enhancement and property estimation.
Findings
Achieved up to 93% accuracy on VAD prediction.
Attained 84% accuracy on noise classification.
R2 of 0.86 on fundamental frequency estimation.
Abstract
Speech Enhancement (SE) in audio devices is often supported by auxiliary modules for Voice Activity Detection (VAD), SNR estimation, or Acoustic Scene Classification to ensure robust context-aware behavior and seamless user experience. Just like SE, these tasks often employ deep learning; however, deploying additional models on-device is computationally impractical, whereas cloud-based inference would introduce additional latency and compromise privacy. Prior work on SE employed Dynamic Channel Pruning (DynCP) to reduce computation by adaptively disabling specific channels based on the current input. In this work, we investigate whether useful signal properties can be estimated from these internal pruning masks, thus removing the need for separate models. We show that simple, interpretable predictors achieve up to 93% accuracy on VAD, 84% on noise classification, and an R2 of 0.86 on F0…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
