Imprompter: Tricking LLM Agents into Improper Tool Use

Xiaohan Fu; Shuheng Li; Zihan Wang; Yihao Liu; Rajesh K. Gupta; Taylor; Berg-Kirkpatrick; Earlence Fernandes

arXiv:2410.14923·cs.CR·October 23, 2024·2 cites

Imprompter: Tricking LLM Agents into Improper Tool Use

Xiaohan Fu, Shuheng Li, Zihan Wang, Yihao Liu, Rajesh K. Gupta, Taylor, Berg-Kirkpatrick, Earlence Fernandes

PDF

Open Access 1 Repo

TL;DR

This paper reveals security vulnerabilities in LLM-based agents by demonstrating automatic prompt attacks that can exfiltrate sensitive data, highlighting the need for improved safeguards in these emerging systems.

Contribution

It introduces a novel class of automatic adversarial prompt attacks targeting LLM agents, demonstrating their effectiveness across multiple platforms and modalities.

Findings

01

Attacks achieve nearly 80% success rate in data exfiltration.

02

Attacks transfer effectively to production-level agents.

03

Multimodal attacks work in both text-only and image domains.

Abstract

Large Language Model (LLM) Agents are an emerging computing paradigm that blends generative machine learning with tools such as code interpreters, web browsing, email, and more generally, external resources. These agent-based systems represent an emerging shift in personal computing. We contribute to the security foundations of agent-based systems and surface a new class of automatically computed obfuscated adversarial prompt attacks that violate the confidentiality and integrity of user resources connected to an LLM agent. We show how prompt optimization techniques can find such prompts automatically given the weights of a model. We demonstrate that such attacks transfer to production-level agents. For example, we show an information exfiltration attack on Mistral's LeChat agent that analyzes a user's conversation, picks out personally identifiable information, and formats it into a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Reapor-Yurnero/imprompter
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMulti-Agent Systems and Negotiation · Business Process Modeling and Analysis · Service-Oriented Architecture and Web Services