TL;DR
FoolHD is a novel adversarial attack method that creates highly imperceptible perturbations in audio files to fool speaker identification systems with high success rates, using a Gated Convolutional Autoencoder in the DCT domain.
Contribution
This work introduces FoolHD, a steganography-inspired adversarial attack leveraging a Gated Convolutional Autoencoder trained with a multi-objective loss to generate imperceptible perturbations in audio for speaker identification.
Findings
Achieves over 99% success rate in fooling speaker ID models.
Generates audio with PESQ scores above 4.30, indicating high imperceptibility.
Effective against a 250-speaker VoxCeleb-based system.
Abstract
Speaker identification models are vulnerable to carefully designed adversarial perturbations of their input signals that induce misclassification. In this work, we propose a white-box steganography-inspired adversarial attack that generates imperceptible adversarial perturbations against a speaker identification model. Our approach, FoolHD, uses a Gated Convolutional Autoencoder that operates in the DCT domain and is trained with a multi-objective loss function, in order to generate and conceal the adversarial perturbation within the original audio files. In addition to hindering speaker identification performance, this multi-objective loss accounts for human perception through a frame-wise cosine similarity between MFCC feature vectors extracted from the original and adversarial audio files. We validate the effectiveness of FoolHD with a 250-speaker identification x-vector network,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSolana Customer Service Number +1-833-534-1729
