Pseudo-Siamese Network based Timbre-reserved Black-box Adversarial   Attack in Speaker Identification

Qing Wang; Jixun Yao; Ziqian Wang; Pengcheng Guo; Lei Xie

arXiv:2305.19020·cs.SD·May 31, 2023·1 cites

Pseudo-Siamese Network based Timbre-reserved Black-box Adversarial Attack in Speaker Identification

Qing Wang, Jixun Yao, Ziqian Wang, Pengcheng Guo, Lei Xie

PDF

Open Access

TL;DR

This paper introduces a novel black-box adversarial attack method for speaker identification that preserves speaker timbre using a pseudo-Siamese network, achieving high success rates in fooling models and humans.

Contribution

It proposes a timbre-reserved adversarial attack framework utilizing a pseudo-Siamese network to effectively deceive speaker identification systems in black-box settings.

Findings

01

Achieves up to 60.58% attack success rate in white-box scenarios.

02

Achieves up to 55.38% attack success rate in black-box scenarios.

03

Successfully deceives both human listeners and machine models.

Abstract

In this study, we propose a timbre-reserved adversarial attack approach for speaker identification (SID) to not only exploit the weakness of the SID model but also preserve the timbre of the target speaker in a black-box attack setting. Particularly, we generate timbre-reserved fake audio by adding an adversarial constraint during the training of the voice conversion model. Then, we leverage a pseudo-Siamese network architecture to learn from the black-box SID model constraining both intrinsic similarity and structural similarity simultaneously. The intrinsic similarity loss is to learn an intrinsic invariance, while the structural similarity loss is to ensure that the substitute SID model shares a similar decision boundary to the fixed black-box SID model. The substitute model can be used as a proxy to generate timbre-reserved fake audio for attacking. Experimental results on the Audio…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Digital Media Forensic Detection · Music and Audio Processing