REVEAL -- Reasoning and Evaluation of Visual Evidence through Aligned Language
Ipsita Praharaj, Yukta Butala, Badrikanath Praharaj, Yash Butala

TL;DR
REVEAL introduces a vision-language framework for detecting and explaining image forgeries by reasoning about the entire scene and individual regions, improving cross-domain generalization.
Contribution
It presents a novel prompt-driven reasoning framework using vision-language models for forgery detection and localization across diverse datasets.
Findings
Effective in detecting forgeries in Photoshop, DeepFake, and AIGC images.
Outperforms baseline models in cross-domain forgery detection.
Provides interpretable reasoning for detected manipulations.
Abstract
The rapid advancement of generative models has intensified the challenge of detecting and interpreting visual forgeries, necessitating robust frameworks for image forgery detection while providing reasoning as well as localization. While existing works approach this problem using supervised training for specific manipulation or anomaly detection in the embedding space, generalization across domains remains a challenge. We frame this problem of forgery detection as a prompt-driven visual reasoning task, leveraging the semantic alignment capabilities of large vision-language models. We propose a framework, `REVEAL` (Reasoning and Evaluation of Visual Evidence through Aligned Language), that incorporates generalized guidelines. We propose two tangential approaches - (1) Holistic Scene-level Evaluation that relies on the physics, semantics, perspective, and realism of the image as a whole…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Multimodal Machine Learning Applications · Video Analysis and Summarization
