Robust Speaker Extraction Network Based on Iterative Refined Adaptation
Chengyun Deng, Shiqian Ma, Yi Zhang, Yongtao Sha, Hui Zhang, Hui Song,, Xiangang Li

TL;DR
This paper introduces an Iterative Refined Adaptation strategy to enhance the robustness and generalization of speaker extraction systems, especially for unseen speakers and mismatched reference voiceprints, demonstrated on two datasets.
Contribution
The paper proposes a novel IRA method that refines speaker embeddings iteratively to improve extraction accuracy and robustness in challenging scenarios.
Findings
IRA improves SI-SDR and PESQ scores.
The method outperforms baseline systems without IRA.
Enhanced generalization to unseen speakers.
Abstract
Speaker extraction aims to extract target speech signal from a multi-talker environment with interference speakers and surrounding noise, given the target speaker's reference information. Most speaker extraction systems achieve satisfactory performance on the premise that the test speakers have been encountered during training time. Such systems suffer from performance degradation given unseen target speakers and/or mismatched reference voiceprint information. In this paper we propose a novel strategy named Iterative Refined Adaptation (IRA) to improve the robustness and generalization capability of speaker extraction systems in the aforementioned scenarios. Given an initial speaker embedding encoded by an auxiliary network, the extraction network can obtain a latent representation of the target speaker, which is fed back to the auxiliary network to get a refined embedding to provide…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
