Natural Language Induced Adversarial Images
Xiaopei Zhu, Peiyang Xu, Guanning Zeng, Yingpeng Dong, Xiaolin Hu

TL;DR
This paper introduces a novel adversarial attack method that uses natural language prompts and text-to-image models to generate semantically meaningful adversarial images, revealing vulnerabilities in deep learning classifiers.
Contribution
It proposes a new natural language induced adversarial image attack leveraging text-to-image models, optimized with genetic algorithms, and maintains semantic consistency using CLIP.
Findings
Semantic adversarial cues like 'foggy' cause classifier errors
Adversarial semantic information transfers across models and tasks
The method works with multiple text-to-image models and classifiers
Abstract
Research of adversarial attacks is important for AI security because it shows the vulnerability of deep learning models and helps to build more robust models. Adversarial attacks on images are most widely studied, which include noise-based attacks, image editing-based attacks, and latent space-based attacks. However, the adversarial examples crafted by these methods often lack sufficient semantic information, making it challenging for humans to understand the failure modes of deep learning models under natural conditions. To address this limitation, we propose a natural language induced adversarial image attack method. The core idea is to leverage a text-to-image model to generate adversarial images given input prompts, which are maliciously constructed to lead to misclassification for a target model. To adopt commercial text-to-image models for synthesizing more natural adversarial…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning
MethodsContrastive Language-Image Pre-training
