A framework of text-dependent speaker verification for chinese numerical   string corpus

Litong Zheng; Feng Hong; Weijie Xu; Wan Zheng

arXiv:2405.07029·cs.SD·May 22, 2024

A framework of text-dependent speaker verification for chinese numerical string corpus

Litong Zheng, Feng Hong, Weijie Xu, Wan Zheng

PDF

Open Access

TL;DR

This paper introduces an end-to-end text-dependent speaker verification system for Chinese numerical strings that separates speaker and text information, improving accuracy significantly on specific corpora.

Contribution

The paper proposes a novel decoupling approach in TD-SV using advanced neural modules and data augmentation, enhancing performance on Chinese numerical speech datasets.

Findings

01

Achieved 49.2% EER reduction on Hi-Mia

02

Achieved 75.0% EER reduction on SHAL

03

Introduced a publicly available Chinese numerical corpus

Abstract

The Chinese numerical string corpus, serves as a valuable resource for speaker verification, particularly in financial transactions. Researches indicate that in short speech scenarios, text-dependent speaker verification (TD-SV) consistently outperforms text-independent speaker verification (TI-SV). However, TD-SV potentially includes the validation of text information, that can be negatively impacted by reading rhythms and pauses. To address this problem, we propose an end-to-end speaker verification system that enhances TD-SV by decoupling speaker and text information. Our system consists of a text embedding extractor, a speaker embedding extractor and a fusion module. In the text embedding extractor, we employ an enhanced Transformer and introduce a triple loss including text classification loss, connectionist temporal classification (CTC) loss and decoder loss; while in the speaker…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques