OmniVL-Guard Pro: A Tool-Augmented Agent for Omnibus Vision-Language Forensics

Jinjie Shen; Zheng Huang; Yuchen Zhang; Yujiao Wu; Yaxiong Wang; Lechao Cheng; Shengeng Tang; Tianrui Hui; Nan Pu; Zhun Zhong

arXiv:2605.16962·cs.CV·May 21, 2026

OmniVL-Guard Pro: A Tool-Augmented Agent for Omnibus Vision-Language Forensics

Jinjie Shen, Zheng Huang, Yuchen Zhang, Yujiao Wu, Yaxiong Wang, Lechao Cheng, Shengeng Tang, Tianrui Hui, Nan Pu, Zhun Zhong

PDF

1 Repo

TL;DR

OmniVL-Guard Pro is a tool-augmented vision-language forensics agent that extends beyond closed-world models by integrating real-time external tools and advanced training methods for improved forgery detection and reasoning.

Contribution

It introduces a novel tool-augmented agent framework with a new training dataset and reinforcement learning approach for open-world vision-language forensics.

Findings

01

Achieves state-of-the-art performance on multiple forensics tasks.

02

Demonstrates strong zero-shot generalization capabilities.

03

Outperforms existing methods in real-time event verification and forgery segmentation.

Abstract

Existing vision-language forgery detection and grounding methods operate under a closed-world paradigm, assuming verification can be completed by the model alone. However, self-contained MLLMs are constrained by finite parametric knowledge, static training corpora, and limited perceptual resolution, creating a practical ceiling in dynamic open-world forensics -- particularly for real-time event verification requiring external clues and forgery segmentation demanding fine-grained scrutiny of local manipulations. To address these limitations, we shift from scaling up the self-contained model toward reaching beyond it. We propose \textbf{OmniVL-Guard Pro}, a tool-augmented agent that extends unified forensics from closed-world prediction to open-world clues-driven reasoning. OmniVL-Guard Pro integrates a tool environment spanning real-time event search, local cropping and zooming,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

shen8424/OmniVL-Guard-Pro
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.