Generating Attribution Reports for Manipulated Facial Images: A Dataset and Baseline
Jingchun Lian, Lingyu Liu, Yaxiong Wang, Yujiao Wu, Lianwei Wu, Li Zhu, Zhedong Zheng

TL;DR
This paper introduces a new task and dataset for generating detailed attribution reports for manipulated facial images, combining localization and natural language explanations to improve understanding of forgeries.
Contribution
It presents a novel multimodal task, a large-scale dataset (MMTT), and a unified framework (ForgeryTalker) for explainable facial forgery attribution.
Findings
ForgeryTalker achieves 59.3 CIDEr score in report generation.
The dataset contains 152,217 samples with precise annotations.
The model performs well on both localization and explanation tasks.
Abstract
Existing facial forgery detection methods typically focus on binary classification or pixel-level localization, providing little semantic insight into the nature of the manipulation. To address this, we introduce Forgery Attribution Report Generation, a new multimodal task that jointly localizes forged regions ("Where") and generates natural language explanations grounded in the editing process ("Why"). This dual-focus approach goes beyond traditional forensics, providing a comprehensive understanding of the manipulation. To enable research in this domain, we present Multi-Modal Tamper Tracing (MMTT), a large-scale dataset of 152,217 samples, each with a process-derived ground-truth mask and a human-authored textual description, ensuring high annotation precision and linguistic richness. We further propose ForgeryTalker, a unified end-to-end framework that integrates vision and language…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
