TL;DR
HQA-VLAttack introduces a novel framework for high-quality black-box adversarial attacks on vision-language models, effectively generating adversarial examples by combining semantic-preserving text perturbations and contrastive learning-based image modifications.
Contribution
It proposes a simple, effective attack method that improves success rates by integrating semantic consistency and contrastive learning, addressing limitations of prior complex or less effective approaches.
Findings
Outperforms strong baselines in attack success rate on benchmark datasets.
Utilizes contrastive learning to optimize image adversarial perturbations.
Ensures semantic consistency in text perturbations using counter-fitting word vectors.
Abstract
Black-box adversarial attack on vision-language pre-trained models is a practical and challenging task, as text and image perturbations need to be considered simultaneously, and only the predicted results are accessible. Research on this problem is in its infancy, and only a handful of methods are available. Nevertheless, existing methods either rely on a complex iterative cross-search strategy, which inevitably consumes numerous queries, or only consider reducing the similarity of positive image-text pairs but ignore that of negative ones, which will also be implicitly diminished, thus inevitably affecting the attack performance. To alleviate the above issues, we propose a simple yet effective framework to generate high-quality adversarial examples on vision-language pre-trained models, named HQA-VLAttack, which consists of text and image attack stages. For text perturbation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
