Exploiting Programmatic Behavior of LLMs: Dual-Use Through Standard   Security Attacks

Daniel Kang; Xuechen Li; Ion Stoica; Carlos Guestrin; Matei Zaharia,; Tatsunori Hashimoto

arXiv:2302.05733·cs.CR·February 14, 2023·27 cites

Exploiting Programmatic Behavior of LLMs: Dual-Use Through Standard Security Attacks

Daniel Kang, Xuechen Li, Ion Stoica, Carlos Guestrin, Matei Zaharia,, Tatsunori Hashimoto

PDF

Open Access

TL;DR

This paper highlights how advanced instruction-following large language models can be exploited for malicious dual-use purposes, such as generating harmful content, which poses significant security challenges and economic incentives for malicious actors.

Contribution

It demonstrates that current LLMs can be exploited to produce malicious content bypassing defenses, revealing new security risks and the need for improved mitigation strategies.

Findings

01

LLMs can generate hate speech and scams effectively.

02

Malicious content production costs are lower than human effort.

03

Current defenses are insufficient against sophisticated attacks.

Abstract

Recent advances in instruction-following large language models (LLMs) have led to dramatic improvements in a range of NLP tasks. Unfortunately, we find that the same improved capabilities amplify the dual-use risks for malicious purposes of these models. Dual-use is difficult to prevent as instruction-following capabilities now enable standard attacks from computer security. The capabilities of these instruction-following LLMs provide strong economic incentives for dual-use by malicious actors. In particular, we show that instruction-following LLMs can produce targeted malicious content, including hate speech and scams, bypassing in-the-wild defenses implemented by LLM API vendors. Our analysis shows that this content can be generated economically and at cost likely lower than with human effort alone. Together, our findings suggest that LLMs will increasingly attract more sophisticated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Software Engineering Research · Natural Language Processing Techniques