Operationalizing a Threat Model for Red-Teaming Large Language Models (LLMs)

Apurv Verma; Satyapriya Krishna; Sebastian Gehrmann; Madhavan Seshadri; Anu Pradhan; Tom Ault; Leslie Barrett; David Rabinowitz; John Doucette; NhatHai Phan

arXiv:2407.14937·cs.CL·December 30, 2025·3 cites

Operationalizing a Threat Model for Red-Teaming Large Language Models (LLMs)

Apurv Verma, Satyapriya Krishna, Sebastian Gehrmann, Madhavan Seshadri, Anu Pradhan, Tom Ault, Leslie Barrett, David Rabinowitz, John Doucette, NhatHai Phan

PDF

Open Access 1 Repo

TL;DR

This paper develops a comprehensive threat model and taxonomy for red-teaming large language models, offering insights, defense methods, and practical strategies to enhance their security and robustness.

Contribution

It introduces a detailed threat model and taxonomy for LLM red-teaming, systematizing existing research and providing practical defense and attack strategies.

Findings

01

Developed a taxonomy of LLM attacks based on development stages

02

Compiled practical red-teaming strategies for practitioners

03

Identified key attack motifs and entry points in LLM systems

Abstract

Creating secure and resilient applications with large language models (LLM) requires anticipating, adjusting to, and countering unforeseen threats. Red-teaming has emerged as a critical technique for identifying vulnerabilities in real-world LLM implementations. This paper presents a detailed threat model and provides a systematization of knowledge (SoK) of red-teaming attacks on LLMs. We develop a taxonomy of attacks based on the stages of the LLM development and deployment process and extract various insights from previous research. In addition, we compile methods for defense and practical red-teaming strategies for practitioners. By delineating prominent attack motifs and shedding light on various entry points, this paper provides a framework for improving the security and robustness of LLM-based systems.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dapurv5/awesome-red-teaming-llms
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling