ShieldVLM: Safeguarding the Multimodal Implicit Toxicity via Deliberative Reasoning with LVLMs

Shiyao Cui; Qinglin Zhang; Xuan Ouyang; Renmiao Chen; Zhexin Zhang; Yida Lu; Hongning Wang; Han Qiu; Minlie Huang

arXiv:2505.14035·cs.MM·May 21, 2025

ShieldVLM: Safeguarding the Multimodal Implicit Toxicity via Deliberative Reasoning with LVLMs

Shiyao Cui, Qinglin Zhang, Xuan Ouyang, Renmiao Chen, Zhexin Zhang, Yida Lu, Hongning Wang, Han Qiu, Minlie Huang

PDF

Open Access

TL;DR

This paper introduces ShieldVLM, a model that uses deliberative reasoning to detect implicit toxicity in multimodal text-image content, addressing a previously underexplored challenge in toxicity moderation.

Contribution

It presents a comprehensive taxonomy and dataset for multimodal implicit toxicity, and develops ShieldVLM, the first model to effectively identify implicit toxicity through cross-modal reasoning.

Findings

01

ShieldVLM outperforms existing baselines in toxicity detection.

02

The dataset covers 7 risk categories and 5 cross-modal modes.

03

The approach enhances detection of both implicit and explicit toxicity.

Abstract

Toxicity detection in multimodal text-image content faces growing challenges, especially with multimodal implicit toxicity, where each modality appears benign on its own but conveys hazard when combined. Multimodal implicit toxicity appears not only as formal statements in social platforms but also prompts that can lead to toxic dialogs from Large Vision-Language Models (LVLMs). Despite the success in unimodal text or image moderation, toxicity detection for multimodal content, particularly the multimodal implicit toxicity, remains underexplored. To fill this gap, we comprehensively build a taxonomy for multimodal implicit toxicity (MMIT) and introduce an MMIT-dataset, comprising 2,100 multimodal statements and prompts across 7 risk categories (31 sub-categories) and 5 typical cross-modal correlation modes. To advance the detection of multimodal implicit toxicity, we build ShieldVLM, a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques