Towards General Visual-Linguistic Face Forgery Detection(V2)

Ke Sun; Shen Chen; Taiping Yao; Ziyin Zhou; Jiayi Ji; Xiaoshuai Sun,; Chia-Wen Lin; Rongrong Ji

arXiv:2502.20698·cs.CV·March 3, 2025

Towards General Visual-Linguistic Face Forgery Detection(V2)

Ke Sun, Shen Chen, Taiping Yao, Ziyin Zhou, Jiayi Ji, Xiaoshuai Sun,, Chia-Wen Lin, Rongrong Ji

PDF

Open Access 1 Repo

TL;DR

This paper introduces FFTG, a novel annotation pipeline that improves text descriptions for face forgery detection by reducing hallucinations, leading to better model performance across benchmarks.

Contribution

The paper proposes FFTG, a new annotation method leveraging forgery masks and prompting strategies to generate accurate descriptions, enhancing multimodal face forgery detection.

Findings

01

Higher region identification accuracy

02

Improved detection performance on benchmarks

03

Effective reduction of hallucination in annotations

Abstract

Face manipulation techniques have achieved significant advances, presenting serious challenges to security and social trust. Recent works demonstrate that leveraging multimodal models can enhance the generalization and interpretability of face forgery detection. However, existing annotation approaches, whether through human labeling or direct Multimodal Large Language Model (MLLM) generation, often suffer from hallucination issues, leading to inaccurate text descriptions, especially for high-quality forgeries. To address this, we propose Face Forgery Text Generator (FFTG), a novel annotation pipeline that generates accurate text descriptions by leveraging forgery masks for initial region and type identification, followed by a comprehensive prompting strategy to guide MLLMs in reducing hallucination. We validate our approach through fine-tuning both CLIP with a three-branch training…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

skjack/vlffd
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Generative Adversarial Networks and Image Synthesis · Digital Media Forensic Detection

MethodsContrastive Language-Image Pre-training