Black-box Attacks on Automatic Speaker Verification using   Feedback-controlled Voice Conversion

Xiaohai Tian; Rohan Kumar Das; Haizhou Li

arXiv:1909.07655·eess.AS·October 30, 2019·Odyssey·1 cites

Black-box Attacks on Automatic Speaker Verification using Feedback-controlled Voice Conversion

Xiaohai Tian, Rohan Kumar Das, Haizhou Li

PDF

Open Access

TL;DR

This paper introduces a black-box attack framework on automatic speaker verification systems that uses feedback from the system's scores to generate more deceptive voice conversion adversarial samples without requiring system knowledge.

Contribution

It presents a novel feedback-controlled voice conversion method that enhances the effectiveness of black-box spoofing attacks on ASV systems.

Findings

01

Feedback-controlled voice conversion increases attack success rate.

02

Adversarial samples are perceptually natural and maintain voice quality.

03

The method outperforms straightforward voice conversion in deception effectiveness.

Abstract

Automatic speaker verification (ASV) systems in practice are greatly vulnerable to spoofing attacks. The latest voice conversion technologies are able to produce perceptually natural sounding speech that mimics any target speakers. However, the perceptual closeness to a speaker's identity may not be enough to deceive an ASV system. In this work, we propose a framework that uses the output scores of an ASV system as the feedback to a voice conversion system. The attacker framework is a black-box adversary that steals one's voice identity, because it does not require any knowledge about the ASV system but the system outputs. Experimental results conducted on ASVspoof 2019 database confirm that the proposed feedback-controlled voice conversion framework produces adversarial samples that are more deceptive than the straightforward voice conversion, thereby boosting the impostor ASV scores.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Natural Language Processing Techniques