When Harmful Content Gets Camouflaged: Unveiling Perception Failure of LVLMs with CamHarmTI

Yanhui Li; Qi Zhou; Zhihong Xu; Huizhong Guo; Wenhai Wang; Dongxia Wang

arXiv:2512.03087·cs.MM·December 4, 2025

When Harmful Content Gets Camouflaged: Unveiling Perception Failure of LVLMs with CamHarmTI

Yanhui Li, Qi Zhou, Zhihong Xu, Huizhong Guo, Wenhai Wang, Dongxia Wang

PDF

Open Access

TL;DR

This paper introduces CamHarmTI, a benchmark revealing that current LVLMs struggle to detect camouflaged harmful content in text-image posts, highlighting perceptual gaps compared to humans and suggesting avenues for model improvement.

Contribution

The paper presents CamHarmTI, a new benchmark for evaluating LVLM perception of camouflaged harmful content, and demonstrates how fine-tuning improves model sensitivity and understanding.

Findings

01

Humans recognize camouflaged harmful content with over 95.75% accuracy.

02

Current LVLMs perform poorly, with ChatGPT-4o achieving only 2.10% accuracy.

03

Fine-tuning increases model accuracy by 55.94% and enhances early-layer sensitivity.

Abstract

Large vision-language models (LVLMs) are increasingly used for tasks where detecting multimodal harmful content is crucial, such as online content moderation. However, real-world harmful content is often camouflaged, relying on nuanced text-image interplay, such as memes or images with embedded malicious text, to evade detection. This raises a key question: \textbf{can LVLMs perceive such camouflaged harmful content as sensitively as humans do?} In this paper, we introduce CamHarmTI, a benchmark for evaluating LVLM ability to perceive and interpret camouflaged harmful content within text-image compositions. CamHarmTI consists of over 4,500 samples across three types of image-text posts. Experiments on 100 human users and 12 mainstream LVLMs reveal a clear perceptual gap: humans easily recognize such content (e.g., over 95.75\% accuracy), whereas current LVLMs often fail (e.g.,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMisinformation and Its Impacts · Multimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis