Multiscale Adaptive Conflict-Balancing Model For Multimedia Deepfake Detection

Zihan Xiong; Xiaohua Wu; Lei Chen; Fangqi Lou

arXiv:2505.12966·cs.CV·May 20, 2025

Multiscale Adaptive Conflict-Balancing Model For Multimedia Deepfake Detection

Zihan Xiong, Xiaohua Wu, Lei Chen, Fangqi Lou

PDF

Open Access

TL;DR

This paper introduces a novel multimodal deepfake detection model that balances audio-visual information using contrastive learning and orthogonalization techniques, achieving high accuracy and strong cross-dataset generalization.

Contribution

The paper presents a multiscale adaptive conflict-balancing model that effectively mitigates modality conflicts in multimedia deepfake detection through innovative fusion and orthogonalization modules.

Findings

01

Achieves an average accuracy of 95.5% across multiple datasets.

02

Demonstrates superior cross-dataset generalization with significant accuracy improvements.

03

Outperforms previous methods on mainstream deepfake datasets.

Abstract

Advances in computer vision and deep learning have blurred the line between deepfakes and authentic media, undermining multimedia credibility through audio-visual forgery. Current multimodal detection methods remain limited by unbalanced learning between modalities. To tackle this issue, we propose an Audio-Visual Joint Learning Method (MACB-DF) to better mitigate modality conflicts and neglect by leveraging contrastive learning to assist in multi-level and cross-modal fusion, thereby fully balancing and exploiting information from each modality. Additionally, we designed an orthogonalization-multimodal pareto module that preserves unimodal information while addressing gradient conflicts in audio-video encoders caused by differing optimization targets of the loss functions. Extensive experiments and ablation studies conducted on mainstream deepfake datasets demonstrate consistent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Digital Media Forensic Detection · Adversarial Robustness in Machine Learning

MethodsContrastive Learning