aTENNuate: Optimized Real-time Speech Enhancement with Deep SSMs on Raw Audio
Yan Ru Pei, Ritik Shrivastava, FNU Sidharth

TL;DR
aTENNuate is an efficient deep state-space autoencoder for real-time speech enhancement directly on raw audio, outperforming previous models in quality and resource usage, and effective even with low-bandwidth inputs.
Contribution
Introduces aTENNuate, a novel end-to-end deep state-space autoencoder for real-time raw speech enhancement with superior performance and efficiency.
Findings
Outperforms previous models in PESQ, parameters, MACs, and latency.
Maintains high fidelity with minimal artifacts on raw speech.
Effective even with compressed low-bandwidth inputs.
Abstract
We present aTENNuate, a simple deep state-space autoencoder configured for efficient online raw speech enhancement in an end-to-end fashion. The network's performance is primarily evaluated on raw speech denoising, with additional assessments on tasks such as super-resolution and de-quantization. We benchmark aTENNuate on the VoiceBank + DEMAND and the Microsoft DNS1 synthetic test sets. The network outperforms previous real-time denoising models in terms of PESQ score, parameter count, MACs, and latency. Even as a raw waveform processing model, the model maintains high fidelity to the clean signal with minimal audible artifacts. In addition, the model remains performant even when the noisy input is compressed down to 4000Hz and 4 bits, suggesting general speech enhancement capabilities in low-resource environments. Try it out by pip install attenuate
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Advanced Adaptive Filtering Techniques
