Efficient Monaural Speech Enhancement using Spectrum Attention Fusion

Jinyu Long; Jetic G\=u; Binhao Bai; Zhibo Yang; Ping Wei; and Junli Li

arXiv:2308.02263·cs.SD·August 7, 2023

Efficient Monaural Speech Enhancement using Spectrum Attention Fusion

Jinyu Long, Jetic G\=u, Binhao Bai, Zhibo Yang, Ping Wei, and Junli Li

PDF

Open Access

TL;DR

This paper introduces Spectrum Attention Fusion, a novel approach that reduces the complexity of Transformer-based speech enhancement models while maintaining or improving performance, making them more efficient for practical use.

Contribution

The paper proposes Spectrum Attention Fusion, a method that replaces multiple self-attention layers with a convolutional module to reduce model size and computational cost.

Findings

01

Achieves comparable or better results than state-of-the-art models

02

Uses significantly fewer parameters (0.58M)

03

Maintains high speech enhancement quality

Abstract

Speech enhancement is a demanding task in automated speech processing pipelines, focusing on separating clean speech from noisy channels. Transformer based models have recently bested RNN and CNN models in speech enhancement, however at the same time they are much more computationally expensive and require much more high quality training data, which is always hard to come by. In this paper, we present an improvement for speech enhancement models that maintains the expressiveness of self-attention while significantly reducing model complexity, which we have termed Spectrum Attention Fusion. We carefully construct a convolutional module to replace several self-attention layers in a speech Transformer, allowing the model to more efficiently fuse spectral features. Our proposed model is able to achieve comparable or better results against SOTA models but with significantly smaller…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Residual Connection · Adam · Dense Connections · Label Smoothing · Dropout · Absolute Position Encodings · Byte Pair Encoding