VSVC: Backdoor attack against Keyword Spotting based on Voiceprint   Selection and Voice Conversion

Hanbo Cai; Pengcheng Zhang; Hai Dong; Yan Xiao; Shunhui Ji

arXiv:2212.10103·cs.SD·December 21, 2022·5 cites

VSVC: Backdoor attack against Keyword Spotting based on Voiceprint Selection and Voice Conversion

Hanbo Cai, Pengcheng Zhang, Hai Dong, Yan Xiao, Shunhui Ji

PDF

Open Access

TL;DR

This paper introduces VSVC, a backdoor attack method on voice keyword spotting systems that uses voiceprint selection and voice conversion to achieve high success rates with minimal training data poisoning.

Contribution

The paper presents a novel backdoor attack scheme, VSVC, exploiting voiceprint manipulation to implant backdoors in DNN-based keyword spotting models.

Findings

01

Achieves nearly 97% attack success rate in experiments

02

Requires poisoning less than 1% of training data

03

Effective across multiple victim models

Abstract

Keyword spotting (KWS) based on deep neural networks (DNNs) has achieved massive success in voice control scenarios. However, training of such DNN-based KWS systems often requires significant data and hardware resources. Manufacturers often entrust this process to a third-party platform. This makes the training process uncontrollable, where attackers can implant backdoors in the model by manipulating third-party training data. An effective backdoor attack can force the model to make specified judgments under certain conditions, i.e., triggers. In this paper, we design a backdoor attack scheme based on Voiceprint Selection and Voice Conversion, abbreviated as VSVC. Experimental results demonstrated that VSVC is feasible to achieve an average attack success rate close to 97% in four victim models when poisoning less than 1% of the training data.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Topic Modeling · Music and Audio Processing