An Effective Energy Mask-based Adversarial Evasion Attacks against Misclassification in Speaker Recognition Systems
Chanwoo Park, Chanwoo Kim

TL;DR
This paper introduces a novel energy masking-based adversarial attack method called MEP that effectively fools speaker recognition systems while maintaining high audio quality and minimal perceptual distortion.
Contribution
The study presents MEP, a new energy masking technique in the frequency domain that enhances adversarial attack effectiveness against speaker recognition models.
Findings
MEP outperforms FGSM in evasion success.
Audio quality remains high with minimal perceptual distortion.
Effective against advanced speaker recognition models.
Abstract
Evasion attacks pose significant threats to AI systems, exploiting vulnerabilities in machine learning models to bypass detection mechanisms. The widespread use of voice data, including deepfakes, in promising future industries is currently hindered by insufficient legal frameworks. Adversarial attack methods have emerged as the most effective countermeasure against the indiscriminate use of such data. This research introduces masked energy perturbation (MEP), a novel approach using power spectrum for energy masking of original voice data. MEP applies masking to small energy regions in the frequency domain before generating adversarial perturbations, targeting areas less noticeable to the human auditory model. The study primarily employs advanced speaker recognition models, including ECAPA-TDNN and ResNet34, which have shown remarkable performance in speaker verification tasks. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Speech Recognition and Synthesis · Emotion and Mood Recognition
