Cross-Entropy Attacks to Language Models via Rare Event Simulation
Mingze Ni, Yongshun Gong, Wei Liu

TL;DR
This paper presents Cross-Entropy Attacks (CEA), a novel black-box adversarial attack method for language models that improves attack success, imperceptibility, and sentence quality by using CE optimization for word replacements.
Contribution
The paper introduces CEA, a versatile and efficient cross-entropy based approach for black-box textual adversarial attacks, addressing limitations of previous methods.
Findings
CEA outperforms existing attacks in success rate
CEA maintains high semantic integrity and sentence quality
CEA is effective across document classification and translation tasks
Abstract
Black-box textual adversarial attacks are challenging due to the lack of model information and the discrete, non-differentiable nature of text. Existing methods often lack versatility for attacking different models, suffer from limited attacking performance due to the inefficient optimization with word saliency ranking, and frequently sacrifice semantic integrity to achieve better attack outcomes. This paper introduces a novel approach to textual adversarial attacks, which we call Cross-Entropy Attacks (CEA), that uses Cross-Entropy optimization to address the above issues. Our CEA approach defines adversarial objectives for both soft-label and hard-label settings and employs CE optimization to identify optimal replacements. Through extensive experiments on document classification and language translation problems, we demonstrate that our attack method excels in terms of attacking…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRadiation Effects in Electronics
