Identifying Source Speakers for Voice Conversion based Spoofing Attacks on Speaker Verification Systems
Danwei Cai, Zexin Cai, Ming Li

TL;DR
This paper explores identifying the original source speaker from voice-converted speech, demonstrating that training with converted data enhances identification accuracy, especially across different voice conversion models.
Contribution
It introduces a method of training speaker embedding networks with converted speech data labeled by source speaker, improving source speaker identification in voice conversion scenarios.
Findings
Source speaker identification is feasible with converted speech data.
Training with diverse converted utterances improves performance on unseen models.
Adding converted data during training enhances robustness against different voice conversion techniques.
Abstract
An automatic speaker verification system aims to verify the speaker identity of a speech signal. However, a voice conversion system could manipulate a person's speech signal to make it sound like another speaker's voice and deceive the speaker verification system. Most countermeasures for voice conversion-based spoofing attacks are designed to discriminate bona fide speech from spoofed speech for speaker verification systems. In this paper, we investigate the problem of source speaker identification -- inferring the identity of the source speaker given the voice converted speech. To perform source speaker identification, we simply add voice-converted speech data with the label of source speaker identity to the genuine speech dataset during speaker embedding network training. Experimental results show the feasibility of source speaker identification when training and testing with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Natural Language Processing Techniques
