Adversarial Evasion Attack Efficiency against Large Language Models
Jo\~ao Vitorino, Eva Maia, Isabel Pra\c{c}a

TL;DR
This paper analyzes the effectiveness and practicality of various adversarial attacks on large language models in sentiment classification, highlighting differences in attack impact and resource requirements.
Contribution
It provides a comparative analysis of three adversarial attack types on multiple LLMs, emphasizing their effectiveness and practicality for robustness assessment.
Findings
Word-level attacks are more effective than character-level attacks.
Character and constrained attacks require fewer perturbations and queries.
Differences in attack impact are crucial for developing robust defenses.
Abstract
Large Language Models (LLMs) are valuable for text classification, but their vulnerabilities must not be disregarded. They lack robustness against adversarial examples, so it is pertinent to understand the impacts of different types of perturbations, and assess if those attacks could be replicated by common users with a small amount of perturbations and a small number of queries to a deployed LLM. This work presents an analysis of the effectiveness, efficiency, and practicality of three different types of adversarial attacks against five different LLMs in a sentiment classification task. The obtained results demonstrated the very distinct impacts of the word-level and character-level attacks. The word attacks were more effective, but the character and more constrained attacks were more practical and required a reduced number of perturbations and queries. These differences need to be…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning
