Malacopula: adversarial automatic speaker verification attacks using a neural-based generalised Hammerstein model
Massimiliano Todisco, Michele Panariello, Xin Wang, H\'ector Delgado,, Kong Aik Lee, Nicholas Evans

TL;DR
Malacopula is a neural-based model that generates adversarial speech perturbations to deceive automatic speaker verification systems, revealing new vulnerabilities and emphasizing the need for robust defenses.
Contribution
This paper introduces Malacopula, a novel neural generalised Hammerstein model for adversarial attacks on ASV systems, demonstrating increased vulnerability in current systems.
Findings
Malacopula significantly increases ASV vulnerability.
Speech quality is reduced in adversarial examples.
Attacks can be detected effectively under controlled conditions.
Abstract
We present Malacopula, a neural-based generalised Hammerstein model designed to introduce adversarial perturbations to spoofed speech utterances so that they better deceive automatic speaker verification (ASV) systems. Using non-linear processes to modify speech utterances, Malacopula enhances the effectiveness of spoofing attacks. The model comprises parallel branches of polynomial functions followed by linear time-invariant filters. The adversarial optimisation procedure acts to minimise the cosine distance between speaker embeddings extracted from spoofed and bona fide utterances. Experiments, performed using three recent ASV systems and the ASVspoof 2019 dataset, show that Malacopula increases vulnerabilities by a substantial margin. However, speech quality is reduced and attacks can be detected effectively under controlled conditions. The findings emphasise the need to identify new…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Digital Media Forensic Detection
