Breaking Down the Defenses: A Comparative Survey of Attacks on Large Language Models
Arijit Ghosh Chowdhury, Md Mofijul Islam, Vaibhav Kumar, Faysal, Hossain Shezan, Vaibhav Kumar, Vinija Jain, Aman Chadha

TL;DR
This survey comprehensively reviews various attack methods on Large Language Models, analyzing their mechanisms, impacts, and defense strategies to enhance understanding and promote the development of more secure NLP systems.
Contribution
It provides a detailed overview of attack types, their effectiveness, and current defenses against LLM vulnerabilities, highlighting gaps and future research directions.
Findings
Adversarial attacks can significantly manipulate LLM outputs.
Data poisoning poses risks during model training.
Current defenses vary in effectiveness and need improvement.
Abstract
Large Language Models (LLMs) have become a cornerstone in the field of Natural Language Processing (NLP), offering transformative capabilities in understanding and generating human-like text. However, with their rising prominence, the security and vulnerability aspects of these models have garnered significant attention. This paper presents a comprehensive survey of the various forms of attacks targeting LLMs, discussing the nature and mechanisms of these attacks, their potential impacts, and current defense strategies. We delve into topics such as adversarial attacks that aim to manipulate model outputs, data poisoning that affects model training, and privacy concerns related to training data exploitation. The paper also explores the effectiveness of different attack methodologies, the resilience of LLMs against these attacks, and the implications for model integrity and user trust. By…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling
