Can LLMs be Fooled? Investigating Vulnerabilities in LLMs

Sara Abdali; Jia He; CJ Barberan; Richard Anarfi

arXiv:2407.20529·cs.LG·July 31, 2024·1 cites

Can LLMs be Fooled? Investigating Vulnerabilities in LLMs

Sara Abdali, Jia He, CJ Barberan, Richard Anarfi

PDF

Open Access

TL;DR

This paper investigates various vulnerabilities of Large Language Models, including model-based, training-time, and inference-time flaws, and discusses mitigation strategies to enhance their robustness and security.

Contribution

It provides a comprehensive analysis of LLM vulnerabilities and introduces mitigation approaches like Model Editing and Chroma Teaming to improve resilience.

Findings

01

LLMs can leak sensitive data when prompted maliciously

02

Mitigation strategies can reduce vulnerability risks

03

Understanding vulnerabilities guides development of more secure LLMs

Abstract

The advent of Large Language Models (LLMs) has garnered significant popularity and wielded immense power across various domains within Natural Language Processing (NLP). While their capabilities are undeniably impressive, it is crucial to identify and scrutinize their vulnerabilities especially when those vulnerabilities can have costly consequences. One such LLM, trained to provide a concise summarization from medical documents could unequivocally leak personal patient data when prompted surreptitiously. This is just one of many unfortunate examples that have been unveiled and further research is necessary to comprehend the underlying reasons behind such vulnerabilities. In this study, we delve into multiple sections of vulnerabilities which are model-based, training-time, inference-time vulnerabilities, and discuss mitigation strategies including "Model Editing" which aims at…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsLaw, AI, and Intellectual Property · Blockchain Technology Applications and Security