Attention does not guarantee best performance in speech enhancement

Zhongshu Hou; Qinwen Hu; Kai Chen; Jing Lu

arXiv:2302.05690·cs.SD·February 14, 2023

Attention does not guarantee best performance in speech enhancement

Zhongshu Hou, Qinwen Hu, Kai Chen, Jing Lu

PDF

Open Access

TL;DR

This paper investigates the effectiveness of attention mechanisms in speech enhancement, demonstrating that traditional global attention may not outperform RNNs due to the local nature of speech signals.

Contribution

The study challenges the assumption that attention always improves speech enhancement by empirically replacing attention with RNNs in SOTA models.

Findings

01

Replacing attention with RNNs does not degrade performance in tested models.

02

Local information may be more important than long-term dependencies in speech enhancement.

03

Attention mechanisms are not universally superior in speech enhancement tasks.

Abstract

Attention mechanism has been widely utilized in speech enhancement (SE) because theoretically it can effectively model the long-term inherent connection of signal both in time domain and spectrum domain. However, the generally used global attention mechanism might not be the best choice since the adjacent information naturally imposes more influence than the far-apart information in speech enhancement. In this paper, we validate this conjecture by replacing attention with RNN in two typical state-of-the-art (SOTA) models, multi-scale temporal frequency convolutional network (MTFAA) with axial attention and conformer-based metric-GAN network (CMGAN).

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Advanced Adaptive Filtering Techniques · Indoor and Outdoor Localization Technologies