Pixels Don't Lie (But Your Detector Might): Bootstrapping MLLM-as-a-Judge for Trustworthy Deepfake Detection and Reasoning Supervision

Kartik Kuckreja; Parul Gupta; Muhammad Haris Khan; Abhinav Dhall

arXiv:2602.19715·cs.CV·February 24, 2026

Pixels Don't Lie (But Your Detector Might): Bootstrapping MLLM-as-a-Judge for Trustworthy Deepfake Detection and Reasoning Supervision

Kartik Kuckreja, Parul Gupta, Muhammad Haris Khan, Abhinav Dhall

PDF

Open Access

TL;DR

This paper introduces DeepfakeJudge, a scalable framework for evaluating and supervising deepfake detection models' reasoning, achieving high accuracy and correlation with human judgments, and improving interpretability and trustworthiness.

Contribution

We propose a bootstrapped reasoning supervision method for deepfake detection that enhances interpretability and evaluation without requiring explicit ground truth rationales.

Findings

01

DeepfakeJudge achieves 96.2% accuracy on meta-evaluation benchmark.

02

The reasoning judge correlates highly with human ratings and has 98.9% pairwise agreement.

03

Participants preferred our model's reasoning 70% of the time for faithfulness and usefulness.

Abstract

Deepfake detection models often generate natural-language explanations, yet their reasoning is frequently ungrounded in visual evidence, limiting reliability. Existing evaluations measure classification accuracy but overlook reasoning fidelity. We propose DeepfakeJudge, a framework for scalable reasoning supervision and evaluation, that integrates an out-of-distribution benchmark containing recent generative and editing forgeries, a human-annotated subset with visual reasoning labels, and a suite of evaluation models, that specialize in evaluating reasoning rationales without the need for explicit ground truth reasoning rationales. The Judge is optimized through a bootstrapped generator-evaluator process that scales human feedback into structured reasoning supervision and supports both pointwise and pairwise evaluation. On the proposed meta-evaluation benchmark, our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis