Multi-task Adversarial Attacks against Black-box Model with Few-shot Queries
Wenqiang Wang, Yan Xiao, Hao Lin, Yangshijie Zhang, Xiaochun Cao

TL;DR
This paper introduces CEMA, a black-box multi-task adversarial attack method that efficiently generates adversarial texts across various tasks using minimal queries and transferability, even against commercial APIs and large models.
Contribution
CEMA is a novel multi-task attack framework that leverages deep substitute models and transferability, enabling effective attacks with limited queries across diverse tasks and models.
Findings
Achieves high attack success with as few as 100 queries.
Effective against multiple tasks including classification, translation, and image generation.
Successfully targets commercial APIs and large language models.
Abstract
Current multi-task adversarial text attacks rely on abundant access to shared internal features and numerous queries, often limited to a single task type. As a result, these attacks are less effective against practical scenarios involving black-box feedback APIs, limited queries, or multiple task types. To bridge this gap, we propose \textbf{C}luster and \textbf{E}nsemble \textbf{M}ulti-task Text Adversarial \textbf{A}ttack (\textbf{CEMA}), an effective black-box attack that exploits the transferability of adversarial texts across different tasks. CEMA simplifies complex multi-task scenarios by using a \textit{deep-level substitute model} trained in a \textit{plug-and-play} manner for text classification, enabling attacks without mimicking the victim model. This approach requires only a few queries for training, converting multi-task attacks into classification attacks and allowing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
