ExARN: self-attending RNN for target speaker extraction

Pengjie Shen; Shulin He; Xueliang Zhang

arXiv:2212.01106·eess.AS·March 14, 2023·1 cites

ExARN: self-attending RNN for target speaker extraction

Pengjie Shen, Shulin He, Xueliang Zhang

PDF

Open Access

TL;DR

This paper introduces ExARN, a novel self-attending RNN model that simultaneously addresses speaker identification and separation for target speaker extraction, demonstrating superior performance in complex acoustic environments.

Contribution

It proposes a new self-attending RNN architecture that effectively combines auxiliary information for improved target speaker extraction.

Findings

01

Achieves excellent performance on target speaker extraction tasks.

02

Effectively combines self-attention with RNNs for speaker separation.

03

Demonstrates robustness in environments with competing speakers.

Abstract

Target speaker extraction is to extract the target speaker, specified by enrollment utterance, in an environment with other competing speakers. Therefore, the task needs to solve two problems, speaker identification and separation, at the same time. In this paper, we combine self-attention and Recurrent Neural Networks (RNN). Further, we exploit various ways to combining different auxiliary information with mixed representations. Experimental results show that our proposed model achieves excellent performance on the task of target speaker extraction.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing