Selector-Enhancer: Learning Dynamic Selection of Local and Non-local   Attention Operation for Speech Enhancement

Xinmeng Xu; Weiping Tu; Yuhong Yang

arXiv:2212.03408·eess.AS·January 16, 2023

Selector-Enhancer: Learning Dynamic Selection of Local and Non-local Attention Operation for Speech Enhancement

Xinmeng Xu, Weiping Tu, Yuhong Yang

PDF

Open Access 1 Video

TL;DR

Selector-Enhancer introduces a dynamic dual-attention CNN that adaptively selects local or non-local attention regions for speech enhancement, improving performance in variable noise conditions.

Contribution

It proposes a novel feature-filter trained with reinforcement learning to dynamically choose attention regions, enhancing speech enhancement effectiveness.

Findings

01

Achieves comparable or better speech enhancement results than existing methods.

02

Effective in real-world scenarios with varying noise types and levels.

03

Demonstrates the benefit of dynamic attention selection in speech processing.

Abstract

Attention mechanisms, such as local and non-local attention, play a fundamental role in recent deep learning based speech enhancement (SE) systems. However, natural speech contains many fast-changing and relatively brief acoustic events, therefore, capturing the most informative speech features by indiscriminately using local and non-local attention is challenged. We observe that the noise type and speech feature vary within a sequence of speech and the local and non-local operations can respectively extract different features from corrupted speech. To leverage this, we propose Selector-Enhancer, a dual-attention based convolution neural network (CNN) with a feature-filter that can dynamically select regions from low-resolution speech features and feed them to local or non-local attention operations. In particular, the proposed feature-filter is trained by using reinforcement learning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Selector-Enhancer: Learning Dynamic Selection of Local and Non-local Attention Operation for Speech Enhancement· underline

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Indoor and Outdoor Localization Technologies

MethodsConvolution