TL;DR
OmniVL-Guard Pro is a tool-augmented vision-language forensics agent that extends beyond closed-world models by integrating real-time external tools and advanced training methods for improved forgery detection and reasoning.
Contribution
It introduces a novel tool-augmented agent framework with a new training dataset and reinforcement learning approach for open-world vision-language forensics.
Findings
Achieves state-of-the-art performance on multiple forensics tasks.
Demonstrates strong zero-shot generalization capabilities.
Outperforms existing methods in real-time event verification and forgery segmentation.
Abstract
Existing vision-language forgery detection and grounding methods operate under a closed-world paradigm, assuming verification can be completed by the model alone. However, self-contained MLLMs are constrained by finite parametric knowledge, static training corpora, and limited perceptual resolution, creating a practical ceiling in dynamic open-world forensics -- particularly for real-time event verification requiring external clues and forgery segmentation demanding fine-grained scrutiny of local manipulations. To address these limitations, we shift from scaling up the self-contained model toward reaching beyond it. We propose \textbf{OmniVL-Guard Pro}, a tool-augmented agent that extends unified forensics from closed-world prediction to open-world clues-driven reasoning. OmniVL-Guard Pro integrates a tool environment spanning real-time event search, local cropping and zooming,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
