SoK: Prompt Hacking of Large Language Models

Baha Rababah; Shang (Tommy) Wu; Matthew Kwiatkowski; Carson Leung,; Cuneyt Gurcan Akcora

arXiv:2410.13901·cs.CR·October 21, 2024

SoK: Prompt Hacking of Large Language Models

Baha Rababah, Shang (Tommy) Wu, Matthew Kwiatkowski, Carson Leung,, Cuneyt Gurcan Akcora

PDF

Open Access

TL;DR

This paper provides a comprehensive overview of prompt hacking attacks on large language models, introduces a new response classification framework, and discusses implications for improving AI safety and robustness.

Contribution

It systematically categorizes prompt hacking types and proposes a novel response evaluation framework for better safety assessment.

Findings

01

Identifies three main prompt hacking types: jailbreaking, leaking, and injection.

02

Introduces a five-class response categorization for detailed analysis.

03

Enhances evaluation methods for LLM safety and robustness.

Abstract

The safety and robustness of large language models (LLMs) based applications remain critical challenges in artificial intelligence. Among the key threats to these applications are prompt hacking attacks, which can significantly undermine the security and reliability of LLM-based systems. In this work, we offer a comprehensive and systematic overview of three distinct types of prompt hacking: jailbreaking, leaking, and injection, addressing the nuances that differentiate them despite their overlapping characteristics. To enhance the evaluation of LLM-based applications, we propose a novel framework that categorizes LLM responses into five distinct classes, moving beyond the traditional binary classification. This approach provides more granular insights into the AI's behavior, improving diagnostic precision and enabling more targeted enhancements to the system's safety and robustness.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling