Multiscale Adaptive Conflict-Balancing Model For Multimedia Deepfake Detection
Zihan Xiong, Xiaohua Wu, Lei Chen, Fangqi Lou

TL;DR
This paper introduces a novel multimodal deepfake detection model that balances audio-visual information using contrastive learning and orthogonalization techniques, achieving high accuracy and strong cross-dataset generalization.
Contribution
The paper presents a multiscale adaptive conflict-balancing model that effectively mitigates modality conflicts in multimedia deepfake detection through innovative fusion and orthogonalization modules.
Findings
Achieves an average accuracy of 95.5% across multiple datasets.
Demonstrates superior cross-dataset generalization with significant accuracy improvements.
Outperforms previous methods on mainstream deepfake datasets.
Abstract
Advances in computer vision and deep learning have blurred the line between deepfakes and authentic media, undermining multimedia credibility through audio-visual forgery. Current multimodal detection methods remain limited by unbalanced learning between modalities. To tackle this issue, we propose an Audio-Visual Joint Learning Method (MACB-DF) to better mitigate modality conflicts and neglect by leveraging contrastive learning to assist in multi-level and cross-modal fusion, thereby fully balancing and exploiting information from each modality. Additionally, we designed an orthogonalization-multimodal pareto module that preserves unimodal information while addressing gradient conflicts in audio-video encoders caused by differing optimization targets of the loss functions. Extensive experiments and ablation studies conducted on mainstream deepfake datasets demonstrate consistent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Digital Media Forensic Detection · Adversarial Robustness in Machine Learning
MethodsContrastive Learning
