Modulating State Space Model with SlowFast Framework for Compute-Efficient Ultra Low-Latency Speech Enhancement
Longbiao Cheng, Ashutosh Pandey, Buye Xu, Tobi Delbruck, Vamsi Krishna, Ithapu, Shih-Chii Liu

TL;DR
This paper presents a SlowFast framework for speech enhancement that significantly reduces computation costs for ultra low-latency processing by combining slow environmental analysis with fast, state-space model-based enhancement.
Contribution
The novel SlowFast framework dynamically modulates a state space model in the fast branch using a slow branch, enabling efficient low-latency speech enhancement.
Findings
Reduced computation cost by 70% compared to baseline
Achieved 62.5 μs latency with 100 M MACs/s
Maintained high enhancement performance with PESQ-NB of 3.12
Abstract
Deep learning-based speech enhancement (SE) methods often face significant computational challenges when needing to meet low-latency requirements because of the increased number of frames to be processed. This paper introduces the SlowFast framework which aims to reduce computation costs specifically when low-latency enhancement is needed. The framework consists of a slow branch that analyzes the acoustic environment at a low frame rate, and a fast branch that performs SE in the time domain at the needed higher frame rate to match the required latency. Specifically, the fast branch employs a state space model where its state transition process is dynamically modulated by the slow branch. Experiments on a SE task with a 2 ms algorithmic latency requirement using the Voice Bank + Demand dataset show that our approach reduces computation cost by 70% compared to a baseline single-branch…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Infant Health and Development
