Crafting Adversarial Inputs for Large Vision-Language Models Using Black-Box Optimization
Jiwei Guan, Haibo Jin, Haohan Wang

TL;DR
This paper introduces a black-box attack method using zeroth-order optimization to craft adversarial inputs for large vision-language models, exposing vulnerabilities without requiring model access.
Contribution
The paper proposes ZO-SPSA, a gradient-free, model-agnostic black-box attack method that reduces resource use and improves transferability for attacking LVLMs.
Findings
Achieved up to 83.0% success rate on InstructBLIP
Generated adversarial examples with imperceptible perturbations
Demonstrated strong transferability of attacks across models
Abstract
Recent advancements in Large Vision-Language Models (LVLMs) have shown groundbreaking capabilities across diverse multimodal tasks. However, these models remain vulnerable to adversarial jailbreak attacks, where adversaries craft subtle perturbations to bypass safety mechanisms and trigger harmful outputs. Existing white-box attacks methods require full model accessibility, suffer from computing costs and exhibit insufficient adversarial transferability, making them impractical for real-world, black-box settings. To address these limitations, we propose a black-box jailbreak attack on LVLMs via Zeroth-Order optimization using Simultaneous Perturbation Stochastic Approximation (ZO-SPSA). ZO-SPSA provides three key advantages: (i) gradient-free approximation by input-output interactions without requiring model knowledge, (ii) model-agnostic optimization without the surrogate model and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Advanced Neural Network Applications · Generative Adversarial Networks and Image Synthesis
