GuardReasoner-VL: Safeguarding VLMs via Reinforced Reasoning

Yue Liu; Shengfang Zhai; Mingzhe Du; Yulin Chen; Tri Cao; Hongcheng Gao; Cheng Wang; Xinfeng Li; Kun Wang; Junfeng Fang; Jiaheng Zhang; Bryan Hooi

arXiv:2505.11049·cs.AI·May 19, 2025

GuardReasoner-VL: Safeguarding VLMs via Reinforced Reasoning

Yue Liu, Shengfang Zhai, Mingzhe Du, Yulin Chen, Tri Cao, Hongcheng Gao, Cheng Wang, Xinfeng Li, Kun Wang, Junfeng Fang, Jiaheng Zhang, Bryan Hooi

PDF

Open Access 1 Repo 4 Models 3 Datasets

TL;DR

This paper presents GuardReasoner-VL, a reasoning-based guard model for vision-language models that uses reinforcement learning and a large reasoning corpus to improve moderation safety and performance.

Contribution

It introduces a novel reasoning-based guard model with reinforcement learning, a large reasoning corpus, and safety-aware training techniques for VLM moderation.

Findings

01

Surpasses baseline by 19.27% F1 score

02

Uses online RL with safety-aware rewards

03

Constructs a large reasoning corpus with 123K samples

Abstract

To enhance the safety of VLMs, this paper introduces a novel reasoning-based VLM guard model dubbed GuardReasoner-VL. The core idea is to incentivize the guard model to deliberatively reason before making moderation decisions via online RL. First, we construct GuardReasoner-VLTrain, a reasoning corpus with 123K samples and 631K reasoning steps, spanning text, image, and text-image inputs. Then, based on it, we cold-start our model's reasoning ability via SFT. In addition, we further enhance reasoning regarding moderation through online RL. Concretely, to enhance diversity and difficulty of samples, we conduct rejection sampling followed by data augmentation via the proposed safety-aware data concatenation. Besides, we use a dynamic clipping parameter to encourage exploration in early stages and exploitation in later stages. To balance performance and token efficiency, we design a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yueliu1999/guardreasoner-vl
pytorchOfficial

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Multimodal Machine Learning Applications · Topic Modeling

MethodsShrink and Fine-Tune