Towards General Visual-Linguistic Face Forgery Detection(V2)
Ke Sun, Shen Chen, Taiping Yao, Ziyin Zhou, Jiayi Ji, Xiaoshuai Sun,, Chia-Wen Lin, Rongrong Ji

TL;DR
This paper introduces FFTG, a novel annotation pipeline that improves text descriptions for face forgery detection by reducing hallucinations, leading to better model performance across benchmarks.
Contribution
The paper proposes FFTG, a new annotation method leveraging forgery masks and prompting strategies to generate accurate descriptions, enhancing multimodal face forgery detection.
Findings
Higher region identification accuracy
Improved detection performance on benchmarks
Effective reduction of hallucination in annotations
Abstract
Face manipulation techniques have achieved significant advances, presenting serious challenges to security and social trust. Recent works demonstrate that leveraging multimodal models can enhance the generalization and interpretability of face forgery detection. However, existing annotation approaches, whether through human labeling or direct Multimodal Large Language Model (MLLM) generation, often suffer from hallucination issues, leading to inaccurate text descriptions, especially for high-quality forgeries. To address this, we propose Face Forgery Text Generator (FFTG), a novel annotation pipeline that generates accurate text descriptions by leveraging forgery masks for initial region and type identification, followed by a comprehensive prompting strategy to guide MLLMs in reducing hallucination. We validate our approach through fine-tuning both CLIP with a three-branch training…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis · Generative Adversarial Networks and Image Synthesis · Digital Media Forensic Detection
MethodsContrastive Language-Image Pre-training
