Selector-Enhancer: Learning Dynamic Selection of Local and Non-local Attention Operation for Speech Enhancement
Xinmeng Xu, Weiping Tu, Yuhong Yang

TL;DR
Selector-Enhancer introduces a dynamic dual-attention CNN that adaptively selects local or non-local attention regions for speech enhancement, improving performance in variable noise conditions.
Contribution
It proposes a novel feature-filter trained with reinforcement learning to dynamically choose attention regions, enhancing speech enhancement effectiveness.
Findings
Achieves comparable or better speech enhancement results than existing methods.
Effective in real-world scenarios with varying noise types and levels.
Demonstrates the benefit of dynamic attention selection in speech processing.
Abstract
Attention mechanisms, such as local and non-local attention, play a fundamental role in recent deep learning based speech enhancement (SE) systems. However, natural speech contains many fast-changing and relatively brief acoustic events, therefore, capturing the most informative speech features by indiscriminately using local and non-local attention is challenged. We observe that the noise type and speech feature vary within a sequence of speech and the local and non-local operations can respectively extract different features from corrupted speech. To leverage this, we propose Selector-Enhancer, a dual-attention based convolution neural network (CNN) with a feature-filter that can dynamically select regions from low-resolution speech features and feed them to local or non-local attention operations. In particular, the proposed feature-filter is trained by using reinforcement learning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Indoor and Outdoor Localization Technologies
MethodsConvolution
