OmniVL-Guard: Towards Unified Vision-Language Forgery Detection and Grounding via Balanced RL

Jinjie Shen; Jing Wu; Yaxiong Wang; Lechao Cheng; Shengeng Tang; Tianrui Hui; Nan Pu; Zhun Zhong

arXiv:2602.10687·cs.CV·May 18, 2026

OmniVL-Guard: Towards Unified Vision-Language Forgery Detection and Grounding via Balanced RL

Jinjie Shen, Jing Wu, Yaxiong Wang, Lechao Cheng, Shengeng Tang, Tianrui Hui, Nan Pu, Zhun Zhong

PDF

1 Repo 2 Models 1 Datasets

TL;DR

OmniVL-Guard introduces a balanced reinforcement learning framework that enhances unified vision-language forgery detection and grounding, effectively addressing the challenge of multi-task optimization in multimodal misinformation detection.

Contribution

It proposes a novel balanced RL framework with Self-Evolving CoT Generation and ARSPO to improve multi-task learning in vision-language forgery detection and grounding.

Findings

01

Outperforms state-of-the-art methods in experiments.

02

Demonstrates zero-shot robustness across out-of-domain scenarios.

03

Achieves significant improvements in fine-grained grounding accuracy.

Abstract

Existing forgery detection methods are often limited to uni-modal or bi-modal settings, failing to handle the interleaved text, images, and videos prevalent in real-world misinformation. To bridge this gap, this paper targets to develop a unified framework for omnibus vision-language forgery detection and grounding. In this unified setting, the {interplay} between diverse modalities and the dual requirements of simultaneous detection and localization pose a critical ``difficulty bias`` problem: the simpler veracity classification task tends to dominate the gradients, leading to suboptimal performance in fine-grained grounding during multi-task optimization. To address this challenge, we propose \textbf{OmniVL-Guard}, a balanced reinforcement learning framework for omnibus vision-language forgery detection and grounding. Particularly, OmniVL-Guard comprises two core designs:…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

shen8424/OmniVL-Guard
github

Models

Datasets

SJJ0854/FSFR
dataset· 123 dl
123 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Generative Adversarial Networks and Image Synthesis · Misinformation and Its Impacts