CLAD: Robust Audio Deepfake Detection Against Manipulation Attacks with Contrastive Learning
Haolin Wu, Jing Chen, Ruiying Du, Cong Wu, Kun He, Xingcan Shang, Hao, Ren, Guowen Xu

TL;DR
This paper investigates the vulnerability of audio deepfake detectors to manipulation attacks and introduces CLAD, a contrastive learning-based method, to significantly improve robustness against such manipulations.
Contribution
The paper presents CLAD, a novel contrastive learning approach that enhances the robustness of audio deepfake detectors against various manipulation attacks.
Findings
Manipulation attacks can significantly bypass existing detectors.
CLAD reduces false acceptance rates to below 1.63% under various attacks.
Existing detectors are vulnerable to volume control, fading, and noise injection.
Abstract
The increasing prevalence of audio deepfakes poses significant security threats, necessitating robust detection methods. While existing detection systems exhibit promise, their robustness against malicious audio manipulations remains underexplored. To bridge the gap, we undertake the first comprehensive study of the susceptibility of the most widely adopted audio deepfake detectors to manipulation attacks. Surprisingly, even manipulations like volume control can significantly bypass detection without affecting human perception. To address this, we propose CLAD (Contrastive Learning-based Audio deepfake Detector) to enhance the robustness against manipulation attacks. The key idea is to incorporate contrastive learning to minimize the variations introduced by manipulations, therefore enhancing detection robustness. Additionally, we incorporate a length loss, aiming to improve the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Media Forensic Detection · Music and Audio Processing · Adversarial Robustness in Machine Learning
MethodsContrastive Learning
