REVEAL -- Reasoning and Evaluation of Visual Evidence through Aligned Language

Ipsita Praharaj; Yukta Butala; Badrikanath Praharaj; Yash Butala

arXiv:2508.12543·cs.CV·September 9, 2025

REVEAL -- Reasoning and Evaluation of Visual Evidence through Aligned Language

Ipsita Praharaj, Yukta Butala, Badrikanath Praharaj, Yash Butala

PDF

Open Access

TL;DR

REVEAL introduces a vision-language framework for detecting and explaining image forgeries by reasoning about the entire scene and individual regions, improving cross-domain generalization.

Contribution

It presents a novel prompt-driven reasoning framework using vision-language models for forgery detection and localization across diverse datasets.

Findings

01

Effective in detecting forgeries in Photoshop, DeepFake, and AIGC images.

02

Outperforms baseline models in cross-domain forgery detection.

03

Provides interpretable reasoning for detected manipulations.

Abstract

The rapid advancement of generative models has intensified the challenge of detecting and interpreting visual forgeries, necessitating robust frameworks for image forgery detection while providing reasoning as well as localization. While existing works approach this problem using supervised training for specific manipulation or anomaly detection in the embedding space, generalization across domains remains a challenge. We frame this problem of forgery detection as a prompt-driven visual reasoning task, leveraging the semantic alignment capabilities of large vision-language models. We propose a framework, `REVEAL` (Reasoning and Evaluation of Visual Evidence through Aligned Language), that incorporates generalized guidelines. We propose two tangential approaches - (1) Holistic Scene-level Evaluation that relies on the physics, semantics, perspective, and realism of the image as a whole…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Multimodal Machine Learning Applications · Video Analysis and Summarization