VLMs Can Aggregate Scattered Training Patches
Zhanhui Zhou, Lingjie Chen, Chao Yang, Chaochao Lu

TL;DR
This paper reveals that vision-language models can piece together scattered image patches to reconstruct harmful content, highlighting a new safety risk in data moderation practices.
Contribution
The study introduces the concept of visual stitching in VLMs, demonstrating their ability to integrate scattered visual information and pose safety challenges.
Findings
VLMs can verbalize correct IDs from image patches.
Models can reconstruct harmful content from scattered patches.
Visual stitching enables bypassing data moderation.
Abstract
One way to mitigate risks in vision-language models (VLMs) is to remove dangerous samples in their training data. However, such data moderation can be easily bypassed when harmful images are split into small, benign-looking patches, scattered across many training samples. VLMs may then learn to piece these fragments together during training and generate harmful responses at inference, either from full images or text references. For instance, if trained on image patches from a bloody scene paired with the descriptions "safe," VLMs may later describe, the full image or a text reference to the scene, as "safe." We define the core ability of VLMs enabling this attack as -- the ability to integrate visual information spread across multiple training samples that share the same textual descriptions. In our work, we first demonstrate visual stitching abilities in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Generative Adversarial Networks and Image Synthesis · Ethics and Social Impacts of AI
