Can LLMs be Fooled? Investigating Vulnerabilities in LLMs
Sara Abdali, Jia He, CJ Barberan, Richard Anarfi

TL;DR
This paper investigates various vulnerabilities of Large Language Models, including model-based, training-time, and inference-time flaws, and discusses mitigation strategies to enhance their robustness and security.
Contribution
It provides a comprehensive analysis of LLM vulnerabilities and introduces mitigation approaches like Model Editing and Chroma Teaming to improve resilience.
Findings
LLMs can leak sensitive data when prompted maliciously
Mitigation strategies can reduce vulnerability risks
Understanding vulnerabilities guides development of more secure LLMs
Abstract
The advent of Large Language Models (LLMs) has garnered significant popularity and wielded immense power across various domains within Natural Language Processing (NLP). While their capabilities are undeniably impressive, it is crucial to identify and scrutinize their vulnerabilities especially when those vulnerabilities can have costly consequences. One such LLM, trained to provide a concise summarization from medical documents could unequivocally leak personal patient data when prompted surreptitiously. This is just one of many unfortunate examples that have been unveiled and further research is necessary to comprehend the underlying reasons behind such vulnerabilities. In this study, we delve into multiple sections of vulnerabilities which are model-based, training-time, inference-time vulnerabilities, and discuss mitigation strategies including "Model Editing" which aims at…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLaw, AI, and Intellectual Property · Blockchain Technology Applications and Security
