aTENNuate: Optimized Real-time Speech Enhancement with Deep SSMs on Raw Audio

Yan Ru Pei; Ritik Shrivastava; FNU Sidharth

arXiv:2409.03377·cs.SD·June 17, 2025

aTENNuate: Optimized Real-time Speech Enhancement with Deep SSMs on Raw Audio

Yan Ru Pei, Ritik Shrivastava, FNU Sidharth

PDF

Open Access

TL;DR

aTENNuate is an efficient deep state-space autoencoder for real-time speech enhancement directly on raw audio, outperforming previous models in quality and resource usage, and effective even with low-bandwidth inputs.

Contribution

Introduces aTENNuate, a novel end-to-end deep state-space autoencoder for real-time raw speech enhancement with superior performance and efficiency.

Findings

01

Outperforms previous models in PESQ, parameters, MACs, and latency.

02

Maintains high fidelity with minimal artifacts on raw speech.

03

Effective even with compressed low-bandwidth inputs.

Abstract

We present aTENNuate, a simple deep state-space autoencoder configured for efficient online raw speech enhancement in an end-to-end fashion. The network's performance is primarily evaluated on raw speech denoising, with additional assessments on tasks such as super-resolution and de-quantization. We benchmark aTENNuate on the VoiceBank + DEMAND and the Microsoft DNS1 synthetic test sets. The network outperforms previous real-time denoising models in terms of PESQ score, parameter count, MACs, and latency. Even as a raw waveform processing model, the model maintains high fidelity to the clean signal with minimal audible artifacts. In addition, the model remains performant even when the noisy input is compressed down to 4000Hz and 4 bits, suggesting general speech enhancement capabilities in low-resource environments. Try it out by pip install attenuate

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Advanced Adaptive Filtering Techniques