Beyond Surface Artifacts: Capturing Shared Latent Forgery Knowledge Across Modalities

Jingtong Dou; Chuancheng Shi; Jian Wang; Fei Shen; Zhiyong Wang; Tat-Seng Chua

arXiv:2604.07763·cs.CV·April 10, 2026

Beyond Surface Artifacts: Capturing Shared Latent Forgery Knowledge Across Modalities

Jingtong Dou, Chuancheng Shi, Jian Wang, Fei Shen, Zhiyong Wang, Tat-Seng Chua

PDF

TL;DR

This paper introduces a modality-agnostic forgery detection framework that captures shared latent forgery knowledge across modalities, significantly improving generalization to unseen and dark modalities in multimodal deepfake detection.

Contribution

It proposes the first modality-agnostic forgery detection framework and a benchmark to evaluate cross-modal generalization, advancing universal multimodal forgery defense.

Findings

01

Empirically demonstrates the existence of universal forgery traces.

02

Achieves significant performance improvements on unknown modalities.

03

Introduces the DeepModal-Bench for rigorous generalization assessment.

Abstract

As generative artificial intelligence evolves, deepfake attacks have escalated from single-modality manipulations to complex, multimodal threats. Existing forensic techniques face a severe generalization bottleneck: by relying excessively on superficial, modality-specific artifacts, they neglect the shared latent forgery knowledge hidden beneath variable physical appearances. Consequently, these models suffer catastrophic performance degradation when confronted with unseen "dark modalities." To break this limitation, this paper introduces a paradigm shift that redefines multimodal forensics from conventional "feature fusion" to "modality generalization." We propose the first modality-agnostic forgery (MAF) detection framework. By explicitly decoupling modality-specific styles, MAF precisely extracts the essential, cross-modal latent forgery knowledge. Furthermore, we define two…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.