Revisiting CroPA: A Reproducibility Study and Enhancements for Cross-Prompt Adversarial Transferability in Vision-Language Models

Atharv Mittal; Agam Pandey; Amritanshu Tiwari; Sukrit Jindal; Swadesh Swain

arXiv:2506.22982·cs.CV·May 4, 2026

Revisiting CroPA: A Reproducibility Study and Enhancements for Cross-Prompt Adversarial Transferability in Vision-Language Models

Atharv Mittal, Agam Pandey, Amritanshu Tiwari, Sukrit Jindal, Swadesh Swain

PDF

TL;DR

This paper reproduces and enhances CroPA, a method for creating transferable adversarial attacks on vision-language models, demonstrating improved success rates and broader applicability across multiple models.

Contribution

It validates CroPA's effectiveness and introduces novel strategies to significantly improve adversarial attack success and generalization in vision-language models.

Findings

01

CroPA's transferability is confirmed across multiple VLMs.

02

Proposed improvements boost attack success rate significantly.

03

Universal perturbations enhance cross-image transferability.

Abstract

Large Vision-Language Models (VLMs) have revolutionized computer vision, enabling tasks such as image classification, captioning, and visual question answering. However, they remain highly vulnerable to adversarial attacks, particularly in scenarios where both visual and textual modalities can be manipulated. In this study, we conduct a comprehensive reproducibility study of "An Image is Worth 1000 Lies: Adversarial Transferability Across Prompts on Vision-Language Models" validating the Cross-Prompt Attack (CroPA) and confirming its superior cross-prompt transferability compared to existing baselines. Beyond replication we propose several key improvements: (1) A novel initialization strategy that significantly improves Attack Success Rate (ASR). (2) Investigate cross-image transferability by learning universal perturbations. (3) A novel loss function targeting vision encoder attention…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.