Enhancing Open-Set Speaker Identification through Rapid Tuning with   Speaker Reciprocal Points and Negative Sample

Zhiyong Chen; Zhiqi Ai; Xinnuo Li; Shugong Xu

arXiv:2409.15742·eess.AS·September 25, 2024

Enhancing Open-Set Speaker Identification through Rapid Tuning with Speaker Reciprocal Points and Negative Sample

Zhiyong Chen, Zhiqi Ai, Xinnuo Li, Shugong Xu

PDF

Open Access

TL;DR

This paper presents a new open-set speaker identification framework combining pretrained WavLM, rapid tuning, and advanced reciprocal points learning, significantly improving accuracy in household multi-speaker scenarios.

Contribution

It introduces a novel SRPL+ method with negative sample learning and integrates it with a rapid tuning neural network for enhanced open-set speaker identification.

Findings

01

Achieved up to 27% performance improvement over baseline models.

02

Effectively handles multi-language, text-dependent speaker recognition.

03

Demonstrated robustness in complex household multi-speaker environments.

Abstract

This paper introduces a novel framework for open-set speaker identification in household environments, playing a crucial role in facilitating seamless human-computer interactions. Addressing the limitations of current speaker models and classification approaches, our work integrates an pretrained WavLM frontend with a few-shot rapid tuning neural network (NN) backend for enrollment, employing task-optimized Speaker Reciprocal Points Learning (SRPL) to enhance discrimination across multiple target speakers. Furthermore, we propose an enhanced version of SRPL (SRPL+), which incorporates negative sample learning with both speech-synthesized and real negative samples to significantly improve open-set SID accuracy. Our approach is thoroughly evaluated across various multi-language text-dependent speaker recognition datasets, demonstrating its effectiveness in achieving high usability for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Advanced Data Compression Techniques