MLLM-as-a-Judge for Image Safety without Human Labeling

Zhenting Wang; Shuming Hu; Shiyu Zhao; Xiaowen Lin; Felix Juefei-Xu,; Zhuowei Li; Ligong Han; Harihar Subramanyam; Li Chen; Jianfa Chen; Nan Jiang,; Lingjuan Lyu; Shiqing Ma; Dimitris N. Metaxas; Ankit Jain

arXiv:2501.00192·cs.CV·April 8, 2025

MLLM-as-a-Judge for Image Safety without Human Labeling

Zhenting Wang, Shuming Hu, Shiyu Zhao, Xiaowen Lin, Felix Juefei-Xu,, Zhuowei Li, Ligong Han, Harihar Subramanyam, Li Chen, Jianfa Chen, Nan Jiang,, Lingjuan Lyu, Shiqing Ma, Dimitris N. Metaxas, Ankit Jain

PDF

Open Access

TL;DR

This paper introduces a novel zero-shot approach using pre-trained Multimodal Large Language Models (MLLMs) to identify unsafe images based on safety rules, eliminating the need for human-labeled data and enabling flexible, rule-based safety assessments.

Contribution

The authors propose a new MLLM-based method that objectifies safety rules, assesses relevance, and uses chain-of-thought reasoning to improve zero-shot image safety judgment accuracy.

Findings

01

High effectiveness in zero-shot safety judgment tasks

02

Outperforms traditional fine-tuning approaches

03

Reduces reliance on human-labeled datasets

Abstract

Image content safety has become a significant challenge with the rise of visual media on online platforms. Meanwhile, in the age of AI-generated content (AIGC), many image generation models are capable of producing harmful content, such as images containing sexual or violent material. Thus, it becomes crucial to identify such unsafe images based on established safety rules. Pre-trained Multimodal Large Language Models (MLLMs) offer potential in this regard, given their strong pattern recognition abilities. Existing approaches typically fine-tune MLLMs with human-labeled datasets, which however brings a series of drawbacks. First, relying on human annotators to label data following intricate and detailed guidelines is both expensive and labor-intensive. Furthermore, users of safety judgment systems may need to frequently update safety rules, making fine-tuning on human-based annotation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMedical Image Segmentation Techniques · Advanced Neural Network Applications · Brain Tumor Detection and Classification

MethodsSparse Evolutionary Training