Speech enhancement deep-learning architecture for efficient edge processing
Monisankha Pal, Arvind Ramanathan, Ted Wada, Ashutosh Pandey

TL;DR
This paper introduces an efficient deep learning architecture for speech enhancement designed for low-power edge devices, combining multi-scale features, squeeze-excitation blocks, and a metric GAN to improve speech quality while reducing computational load.
Contribution
The proposed WSR-MGAN architecture effectively balances speech enhancement performance with computational efficiency suitable for edge devices, integrating novel multi-scale and attention mechanisms.
Findings
Outperforms baseline models on VoiceBank+DEMAND dataset
Achieves state-of-the-art results in time-domain speech enhancement
Maintains high speech quality with reduced computational complexity
Abstract
Deep learning has become a de facto method of choice for speech enhancement tasks with significant improvements in speech quality. However, real-time processing with reduced size and computations for low-power edge devices drastically degrades speech quality. Recently, transformer-based architectures have greatly reduced the memory requirements and provided ways to improve the model performance through local and global contexts. However, the transformer operations remain computationally heavy. In this work, we introduce WaveUNet squeeze-excitation Res2 (WSR)-based metric generative adversarial network (WSR-MGAN) architecture that can be efficiently implemented on low-power edge devices for noise suppression tasks while maintaining speech quality. We utilize multi-scale features using Res2Net blocks that can be related to spectral content used in speech-processing tasks. In the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Infant Health and Development
Methods1x1 Convolution · Residual Connection · Res2Net Block · Kaiming Initialization · Average Pooling · Global Average Pooling · *Communicated@Fast*How Do I Communicate to Expedia? · Batch Normalization · Convolution · Res2Net
