Detecting Deepfakes with Multivariate Soft Blending and CLIP-based Image-Text Alignment

Jingwei Li; Jiaxin Tong; Pengfei Wu

arXiv:2602.15903·cs.CV·February 19, 2026

Detecting Deepfakes with Multivariate Soft Blending and CLIP-based Image-Text Alignment

Jingwei Li, Jiaxin Tong, Pengfei Wu

PDF

Open Access

TL;DR

This paper introduces MSBA-CLIP, a novel deepfake detection framework that uses multimodal alignment, data augmentation, and forgery intensity estimation to improve accuracy and robustness across diverse datasets.

Contribution

The paper proposes a new framework combining multivariate soft blending augmentation and CLIP-guided forgery intensity estimation for enhanced deepfake detection.

Findings

01

Achieves state-of-the-art accuracy and AUC improvements in in-domain tests.

02

Demonstrates strong cross-domain generalization across five datasets.

03

Validates effectiveness of proposed components through ablation studies.

Abstract

The proliferation of highly realistic facial forgeries necessitates robust detection methods. However, existing approaches often suffer from limited accuracy and poor generalization due to significant distribution shifts among samples generated by diverse forgery techniques. To address these challenges, we propose a novel Multivariate and Soft Blending Augmentation with CLIP-guided Forgery Intensity Estimation (MSBA-CLIP) framework. Our method leverages the multimodal alignment capabilities of CLIP to capture subtle forgery traces. We introduce a Multivariate and Soft Blending Augmentation (MSBA) strategy that synthesizes images by blending forgeries from multiple methods with random weights, forcing the model to learn generalizable patterns. Furthermore, a dedicated Multivariate Forgery Intensity Estimation (MFIE) module is designed to explicitly guide the model in learning features…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Digital Media Forensic Detection · Face recognition and analysis