SpeakerRPL v2: Robust Open-set Speaker Identification through Enhanced Few-shot Foundation Tuning and Model Fusion

Zhiyong Chen; Shuhang Wu; Yingjie Duan; Xinkang Xu; Xinhui Hu

arXiv:2604.13605·eess.AS·April 16, 2026

SpeakerRPL v2: Robust Open-set Speaker Identification through Enhanced Few-shot Foundation Tuning and Model Fusion

Zhiyong Chen, Shuhang Wu, Yingjie Duan, Xinkang Xu, Xinhui Hu

PDF

TL;DR

This paper introduces SpeakerRPL v2, an improved open-set speaker identification method that enhances robustness and generalization through advanced training objectives, model fusion, and selection strategies, validated on multiple datasets.

Contribution

It presents novel integration of reciprocal points learning with LogitNorm and adaptive anchor learning, along with a model fusion strategy and selection method for better open-set speaker identification.

Findings

01

Reduces EER from 1.28% to 0.09% on Vox1-O-like test set

02

Demonstrates robustness across VoxCeleb, ESD, and 3D-Speaker datasets

03

Improves stability and generalization in few-shot tuning

Abstract

This paper proposes an improved approach for open-set speaker identification based on pretrained speaker foundation models. Building upon the previous Speaker Reciprocal Points Learning framework (V1), we first introduce an enhanced open-set learning objective by integrating reciprocal points learning with logit normalization (LogitNorm) and incorporating adaptive anchor learning to better constrain target speaker representations and improve robustness. Second, we propose a model fusion strategy to stabilize and enhance the few-shot tuning process, effectively reducing result randomness and improving generalization. Furthermore, we introduce a model selection method to ensure optimal performance in model fusion. Experimental evaluations on the VoxCeleb, ESD and 3D-Speaker datasets demonstrate the effectiveness and robustness of the proposed method under diverse conditions. On a newly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.