Pixels Don't Lie (But Your Detector Might): Bootstrapping MLLM-as-a-Judge for Trustworthy Deepfake Detection and Reasoning Supervision
Kartik Kuckreja, Parul Gupta, Muhammad Haris Khan, Abhinav Dhall

TL;DR
This paper introduces DeepfakeJudge, a scalable framework for evaluating and supervising deepfake detection models' reasoning, achieving high accuracy and correlation with human judgments, and improving interpretability and trustworthiness.
Contribution
We propose a bootstrapped reasoning supervision method for deepfake detection that enhances interpretability and evaluation without requiring explicit ground truth rationales.
Findings
DeepfakeJudge achieves 96.2% accuracy on meta-evaluation benchmark.
The reasoning judge correlates highly with human ratings and has 98.9% pairwise agreement.
Participants preferred our model's reasoning 70% of the time for faithfulness and usefulness.
Abstract
Deepfake detection models often generate natural-language explanations, yet their reasoning is frequently ungrounded in visual evidence, limiting reliability. Existing evaluations measure classification accuracy but overlook reasoning fidelity. We propose DeepfakeJudge, a framework for scalable reasoning supervision and evaluation, that integrates an out-of-distribution benchmark containing recent generative and editing forgeries, a human-annotated subset with visual reasoning labels, and a suite of evaluation models, that specialize in evaluating reasoning rationales without the need for explicit ground truth reasoning rationales. The Judge is optimized through a bootstrapped generator-evaluator process that scales human feedback into structured reasoning supervision and supports both pointwise and pairwise evaluation. On the proposed meta-evaluation benchmark, our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis
