Attention-based scaling adaptation for target speech extraction
Jiangyu Han, Wei Rao, Yanhua Long, Jiaen Liang

TL;DR
This paper introduces an attention-based scaling adaptation method that enhances target speech extraction by exploiting speaker clues more effectively, achieving significant performance improvements without additional parameters.
Contribution
It proposes a novel attention mechanism and mixture embedding pooling method for better target speech extraction, with competitive results in single-channel and multi-channel scenarios.
Findings
Improves target speech extraction performance on WSJ0 2-mix dataset.
Single-channel ASA achieves comparable gains to multi-channel IPD-based methods.
Enhances extraction accuracy without adding extra model parameters.
Abstract
The target speech extraction has attracted widespread attention in recent years. In this work, we focus on investigating the dynamic interaction between different mixtures and the target speaker to exploit the discriminative target speaker clues. We propose a special attention mechanism without introducing any additional parameters in a scaling adaptation layer to better adapt the network towards extracting the target speech. Furthermore, by introducing a mixture embedding matrix pooling method, our proposed attention-based scaling adaptation (ASA) can exploit the target speaker clues in a more efficient way. Experimental results on the spatialized reverberant WSJ0 2-mix dataset demonstrate that the proposed method can improve the performance of the target speech extraction effectively. Furthermore, we find that under the same network configurations, the ASA in a single-channel…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing
