Study of Lightweight Transformer Architectures for Single-Channel Speech Enhancement
Haixin Zhao, Nilesh Madhu

TL;DR
This paper introduces a lightweight, transformer-based speech enhancement model that achieves state-of-the-art performance with significantly fewer parameters and computational requirements, suitable for edge devices.
Contribution
The paper proposes a novel streamlined FTF transformer architecture with adversarial training, reducing complexity while maintaining or improving performance over existing models.
Findings
LCT-GAN requires only 6% of DeepFilterNet2's parameters with similar performance.
LCT-GAN saves 9% parameters and 10% multiply-accumulate operations compared to CCFNet+(Lite).
LCT-GAN outperforms more complex baseline models on standard datasets.
Abstract
In speech enhancement, achieving state-of-the-art (SotA) performance while adhering to the computational constraints on edge devices remains a formidable challenge. Networks integrating stacked temporal and spectral modelling effectively leverage improved architectures such as transformers; however, they inevitably incur substantial computational complexity and model expansion. Through systematic ablation analysis on transformer-based temporal and spectral modelling, we demonstrate that the architecture employing streamlined Frequency-Time-Frequency (FTF) stacked transformers efficiently learns global dependencies within causal context, while avoiding considerable computational demands. Utilising discriminators in training further improves learning efficacy and enhancement without introducing additional complexity during inference. The proposed lightweight, causal, transformer-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Advanced Adaptive Filtering Techniques
