Attribution-Guided Multimodal Deepfake Detection via Cross-Modal Forensic Fingerprints
Wasim Ahmad, Wei Zhang, Xuerui Mao

TL;DR
This paper introduces AMDD, a novel multimodal deepfake detection framework that jointly detects and attributes forgeries by enforcing generator-specific forensic features across audio-visual streams, improving robustness and interpretability.
Contribution
AMDD employs attribution-guided learning with a cross-modal forensic fingerprint consistency loss to enhance deepfake detection and attribution, addressing limitations of dataset-specific artifacts.
Findings
AMDD achieves 99.7% balanced accuracy on FakeAVCeleb.
Cross-dataset evaluation shows robust real video detection.
Generator attribution accuracy reaches 95.9%."
Abstract
Audio-visual deepfakes have reached a level of realism that makes perceptual detection unreliable, threatening media integrity and biometric security. While multimodal detection has shown promise, most approaches are binary classification tasks that often latch onto dataset-specific artifacts rather than genuine generative traces. We argue that a detector incapable of identifying how a video was forged is likely learning the wrong signal. Unlike binary detection, attribution-guided learning imposes a stronger geometric constraint on the shared embedding space, forcing the model to encode generator-specific forensic content rather than shortcuts. We propose the Attribution-Guided Multimodal Deepfake Detection (AMDD) framework, which jointly learns to detect and attribute manipulation. AMDD treats generator attribution as a structured regularization that constrains representation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
