MFVLR: Multi-domain Fine-grained Vision-Language Reconstruction for Generalizable Diffusion Face Forgery Detection and Localization

Yaning Zhang,Tianyi Wang,Zan Gao,Yibo Zhao,Chunjie Ma,Meng Wang

arXiv:2605.10071·cs.CV·May 12, 2026

MFVLR: Multi-domain Fine-grained Vision-Language Reconstruction for Generalizable Diffusion Face Forgery Detection and Localization

Yaning Zhang,Tianyi Wang,Zan Gao,Yibo Zhao,Chunjie Ma,Meng Wang

PDF

TL;DR

This paper introduces MFVLR, a multi-domain vision-language model that improves generalization in detecting and localizing diffusion-synthesized face forgeries across various settings.

Contribution

The paper proposes a novel multi-domain vision-language reconstruction approach with a fine-grained language transformer and vision encoder for better forgery detection and localization.

Findings

01

Outperforms state-of-the-art in cross-generator evaluations.

02

Effective in cross-forgery and cross-dataset scenarios.

03

Enhances forgery localization accuracy.

Abstract

The swift advancement in photo-realistic face generation technology has sparked considerable concerns across society and academia, emphasizing the requirement of generalizable face forgery detection and localization methods. Prior works tend to capture face forgery patterns across multiple domains using image modality, other modalities like fine-grained texts are not comprehensively investigated, which restricts the generalization capability of models. Besides, they usually analyze facial images created by GAN, but struggle to identify and localize those synthesized by diffusion. To solve the problems, in this paper, we devise a novel multi-domain fine-grained vision-language reconstruction (MFVLR) model, which explores comprehensive and diverse visual forgery traces via language-guided face forgery representation learning, to achieve generalizable diffusion-synthesized face forgery…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.