Natural Language Induced Adversarial Images

Xiaopei Zhu; Peiyang Xu; Guanning Zeng; Yingpeng Dong; Xiaolin Hu

arXiv:2410.08620·cs.CR·October 14, 2024

Natural Language Induced Adversarial Images

Xiaopei Zhu, Peiyang Xu, Guanning Zeng, Yingpeng Dong, Xiaolin Hu

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel adversarial attack method that uses natural language prompts and text-to-image models to generate semantically meaningful adversarial images, revealing vulnerabilities in deep learning classifiers.

Contribution

It proposes a new natural language induced adversarial image attack leveraging text-to-image models, optimized with genetic algorithms, and maintains semantic consistency using CLIP.

Findings

01

Semantic adversarial cues like 'foggy' cause classifier errors

02

Adversarial semantic information transfers across models and tasks

03

The method works with multiple text-to-image models and classifiers

Abstract

Research of adversarial attacks is important for AI security because it shows the vulnerability of deep learning models and helps to build more robust models. Adversarial attacks on images are most widely studied, which include noise-based attacks, image editing-based attacks, and latent space-based attacks. However, the adversarial examples crafted by these methods often lack sufficient semantic information, making it challenging for humans to understand the failure modes of deep learning models under natural conditions. To address this limitation, we propose a natural language induced adversarial image attack method. The core idea is to leverage a text-to-image model to generate adversarial images given input prompts, which are maliciously constructed to lead to misclassification for a target model. To adopt commercial text-to-image models for synthesizing more natural adversarial…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zxp555/natural-language-induced-adversarial-images
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning

MethodsContrastive Language-Image Pre-training