Malicious Image Analysis via Vision-Language Segmentation Fusion: Detection, Element, and Location in One-shot

Sheng Hang; Chaoxiang He; Hongsheng Hu; Hanqing Hu; Bin Benjamin Zhu; Shi-Feng Sun; Dawu Gu; Shuo Wang

arXiv:2512.04599·cs.CV·December 5, 2025

Malicious Image Analysis via Vision-Language Segmentation Fusion: Detection, Element, and Location in One-shot

Sheng Hang, Chaoxiang He, Hongsheng Hu, Hanqing Hu, Bin Benjamin Zhu, Shi-Feng Sun, Dawu Gu, Shuo Wang

PDF

Open Access

TL;DR

This paper presents a zero-shot vision-language segmentation method for detecting, identifying, and localizing malicious objects in images with high accuracy and robustness, aiding fine-grained moderation of illicit visual content.

Contribution

The authors introduce a novel one-pass pipeline combining foundation segmentation and vision-language models for fine-grained malicious content detection and localization, with ensemble robustness against attacks.

Findings

01

Achieves 85.8% element-level recall and 78.1% precision on malicious content detection.

02

Outperforms direct zero-shot VLM localization by 27.4% recall at similar precision.

03

Demonstrates robustness with less than 10% performance drop under adversarial attacks.

Abstract

Detecting illicit visual content demands more than image-level NSFW flags; moderators must also know what objects make an image illegal and where those objects occur. We introduce a zero-shot pipeline that simultaneously (i) detects if an image contains harmful content, (ii) identifies each critical element involved, and (iii) localizes those elements with pixel-accurate masks - all in one pass. The system first applies foundation segmentation model (SAM) to generate candidate object masks and refines them into larger independent regions. Each region is scored for malicious relevance by a vision-language model using open-vocabulary prompts; these scores weight a fusion step that produces a consolidated malicious object map. An ensemble across multiple segmenters hardens the pipeline against adaptive attacks that target any single segmentation method. Evaluated on a newly-annotated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Hate Speech and Cyberbullying Detection · Generative Adversarial Networks and Image Synthesis