Neural Scoring: A Refreshed End-to-End Approach for Speaker Recognition in Complex Conditions

Wan Lin; Junhui Chen; Tianhao Wang; Zhenyu Zhou; Lantian Li; Dong Wang

arXiv:2410.16428·cs.SD·July 4, 2025

Neural Scoring: A Refreshed End-to-End Approach for Speaker Recognition in Complex Conditions

Wan Lin, Junhui Chen, Tianhao Wang, Zhenyu Zhou, Lantian Li, Dong Wang

PDF

Open Access

TL;DR

This paper introduces Neural Scoring, an end-to-end speaker verification framework that directly estimates verification probabilities, improving robustness in complex multi-talker scenarios and significantly reducing error rates.

Contribution

The paper presents Neural Scoring, a novel end-to-end approach that bypasses speaker embeddings, and introduces LtE2E training for efficient large-scale verification, enhancing performance in challenging conditions.

Findings

01

Neural Scoring outperforms baseline methods across various conditions.

02

Achieved 70.36% reduction in EER on VoxCeleb dataset.

03

Effective in multi-talker speech scenarios.

Abstract

Modern speaker verification systems primarily rely on speaker embeddings, followed by verification based on cosine similarity between the embedding vectors of the enrollment and test utterances. While effective, these methods struggle with multi-talker speech due to the unidentifiability of embedding vectors. In this paper, we propose Neural Scoring (NS), a refreshed end-to-end framework that directly estimates verification posterior probabilities without relying on test-side embeddings, making it more robust to complex conditions, e.g., with multiple talkers. To make the training of such an end-to-end model more efficient, we introduce a large-scale trial e2e training (LtE2E) strategy, where each test utterance pairs with a set of enrolled speakers, thus enabling the processing of large-scale verification trials per batch. Experiments on the VoxCeleb dataset demonstrate that NS…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing