Exposing Vulnerabilities in Visible-Infrared VLMs: A Unified Geometric Adversarial Framework with Cross-Task Transferability

Xiang Chen; Yuxian Dong; Chao Li; Chengyin Hu; Jiaju Han; Fengyu Zhang; Yiwei Wei; Jiahuan Long; Jiujiang Guo

arXiv:2605.22273·cs.CV·May 22, 2026

Exposing Vulnerabilities in Visible-Infrared VLMs: A Unified Geometric Adversarial Framework with Cross-Task Transferability

Xiang Chen, Yuxian Dong, Chao Li, Chengyin Hu, Jiaju Han, Fengyu Zhang, Yiwei Wei, Jiahuan Long, Jiujiang Guo

PDF

TL;DR

This paper introduces CFGPatch, a novel geometric adversarial patch framework that exploits cross-modal vulnerabilities in visible-infrared vision-language models, demonstrating high attack success and transferability across tasks.

Contribution

The paper presents CFGPatch, a unified curved-edge fractal adversarial framework that enhances attack effectiveness and cross-task transferability in VIS-IR VLMs.

Findings

01

CFGPatch effectively fools VIS-IR VLMs with high success rates.

02

Adversarial samples transfer well to image captioning and VQA tasks.

03

Outperforms standard patch baselines in robustness and effectiveness.

Abstract

Vision-language models (VLMs) have achieved strong performance across diverse multimodal tasks, but their adversarial robustness in visible-infrared (VIS-IR) scenarios remains underexplored. This gap is critical because VIS-IR sensing is widely used in real-world perception systems to support reliable understanding under challenging imaging conditions. To address this cross-modal threat setting, we propose CFGPatch, a curved-edge fractal geometric adversarial patch framework for attacking VIS-IR VLMs. CFGPatch builds on triangular fractal geometry and replaces rigid straight-edged primitives with Bezier-curved elements, preserving multi-scale fractal self-similarity while introducing smoother contours, richer directional variation, and more flexible shape deformation. In addition, we design a modality-specific Fraser-spiral rendering mechanism to inject fine-grained texture distortions…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.