ProGuard: Towards Proactive Multimodal Safeguard

Shaohan Yu; Lijun Li; Chenyang Si; Lu Sheng; Jing Shao

arXiv:2512.23573·cs.CV·December 30, 2025

ProGuard: Towards Proactive Multimodal Safeguard

Shaohan Yu, Lijun Li, Chenyang Si, Lu Sheng, Jing Shao

PDF

Open Access 2 Models 1 Datasets

TL;DR

ProGuard is a vision-language model designed to proactively identify and describe out-of-distribution safety risks in multimodal generative models, using a large annotated dataset and reinforcement learning for improved safety moderation.

Contribution

It introduces a large, balanced multimodal safety dataset and a reinforcement learning-based training method for proactive safety risk detection and description.

Findings

01

ProGuard matches large models in safety classification performance.

02

It significantly outperforms existing open-source guards in unsafe content categorization.

03

ProGuard improves OOD risk detection by 52.6% and description by 64.8%.

Abstract

The rapid evolution of generative models has led to a continuous emergence of multimodal safety risks, exposing the limitations of existing defense methods. To address these challenges, we propose ProGuard, a vision-language proactive guard that identifies and describes out-of-distribution (OOD) safety risks without the need for model adjustments required by traditional reactive approaches. We first construct a modality-balanced dataset of 87K samples, each annotated with both binary safety labels and risk categories under a hierarchical multimodal safety taxonomy, effectively mitigating modality bias and ensuring consistent moderation across text, image, and text-image inputs. Based on this dataset, we train our vision-language base model purely through reinforcement learning (RL) to achieve efficient and concise reasoning. To approximate proactive safety scenarios in a controlled…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

yushaohan/ProGuard-data
dataset· 30 dl
30 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Hate Speech and Cyberbullying Detection · Topic Modeling