ExARN: self-attending RNN for target speaker extraction
Pengjie Shen, Shulin He, Xueliang Zhang

TL;DR
This paper introduces ExARN, a novel self-attending RNN model that simultaneously addresses speaker identification and separation for target speaker extraction, demonstrating superior performance in complex acoustic environments.
Contribution
It proposes a new self-attending RNN architecture that effectively combines auxiliary information for improved target speaker extraction.
Findings
Achieves excellent performance on target speaker extraction tasks.
Effectively combines self-attention with RNNs for speaker separation.
Demonstrates robustness in environments with competing speakers.
Abstract
Target speaker extraction is to extract the target speaker, specified by enrollment utterance, in an environment with other competing speakers. Therefore, the task needs to solve two problems, speaker identification and separation, at the same time. In this paper, we combine self-attention and Recurrent Neural Networks (RNN). Further, we exploit various ways to combining different auxiliary information with mixed representations. Experimental results show that our proposed model achieves excellent performance on the task of target speaker extraction.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
