Knowing When to Quit: Probabilistic Early Exits for Speech Separation
Kenny Falk{\ae}r Olsen, Mads {\O}stergaard, Karl Ulb{\ae}k, S{\o}ren F{\o}ns Nielsen, Rasmus Malik H{\o}egh Lindrup, Bj{\o}rn Sand Jensen, Morten M{\o}rup

TL;DR
This paper introduces a neural network architecture for speech separation and enhancement that supports early-exit, enabling dynamic computation scaling based on uncertainty estimates, suitable for resource-constrained devices.
Contribution
It proposes a probabilistic framework for early-exit in speech separation models, allowing adaptive computation without sacrificing performance.
Findings
Early-exit models maintain quality while reducing compute.
Uncertainty-aware conditions are well-calibrated for variable-length audio.
Significant compute savings achieved with dynamic scaling.
Abstract
In recent years, deep learning-based single-channel speech separation has improved considerably, in large part driven by increasingly compute- and parameter-efficient neural network architectures. Most such architectures are, however, designed with a fixed compute and parameter budget and consequently cannot scale to varying compute demands or resources, which limits their use in embedded and heterogeneous devices such as mobile phones and hearables. To enable such use-cases we design a neural network architecture for speech separation and enhancement capable of early-exit, and we propose an uncertainty-aware probabilistic framework to jointly model the clean speech signal and error variance which we use to derive probabilistic early-exit conditions in terms of desired signal-to-noise ratios. We evaluate our methods on both speech separation and enhancement tasks where we demonstrate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Hearing Loss and Rehabilitation
