EffiFusion-GAN: Efficient Fusion Generative Adversarial Network for Speech Enhancement
Bin Wen, Tien-Ping Tan

TL;DR
EffiFusion-GAN is a lightweight, efficient speech enhancement model that combines depthwise separable convolutions, advanced attention mechanisms, and dynamic pruning to improve performance and reduce model size for resource-limited applications.
Contribution
The paper introduces EffiFusion-GAN, a novel lightweight GAN architecture with integrated multi-scale features, attention, and pruning techniques for effective speech enhancement.
Findings
Achieves a PESQ score of 3.45 on VoiceBank+DEMAND dataset.
Outperforms existing models with similar parameter counts.
Maintains performance while significantly reducing model size.
Abstract
We introduce EffiFusion-GAN (Efficient Fusion Generative Adversarial Network), a lightweight yet powerful model for speech enhancement. The model integrates depthwise separable convolutions within a multi-scale block to capture diverse acoustic features efficiently. An enhanced attention mechanism with dual normalization and residual refinement further improves training stability and convergence. Additionally, dynamic pruning is applied to reduce model size while maintaining performance, making the framework suitable for resource-constrained environments. Experimental evaluation on the public VoiceBank+DEMAND dataset shows that EffiFusion-GAN achieves a PESQ score of 3.45, outperforming existing models under the same parameter settings.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Infant Health and Development
