garak: A Framework for Security Probing Large Language Models

Leon Derczynski; Erick Galinkin; Jeffrey Martin; Subho Majumdar; Nanna; Inie

arXiv:2406.11036·cs.CL·June 18, 2024·6 cites

garak: A Framework for Security Probing Large Language Models

Leon Derczynski, Erick Galinkin, Jeffrey Martin, Subho Majumdar, Nanna, Inie

PDF

Open Access 2 Repos

TL;DR

Garak is a comprehensive framework designed to systematically evaluate the security vulnerabilities of large language models through structured probing, aiding in understanding weaknesses and informing deployment policies.

Contribution

The paper introduces garak, a novel framework for holistic and structured security probing of LLMs, addressing the dynamic and context-dependent nature of model vulnerabilities.

Findings

01

Garak effectively identifies potential security weaknesses in LLMs.

02

The framework supports context-aware vulnerability assessment.

03

It facilitates informed discussions on LLM safety and deployment policies.

Abstract

As Large Language Models (LLMs) are deployed and integrated into thousands of applications, the need for scalable evaluation of how models respond to adversarial attacks grows rapidly. However, LLM security is a moving target: models produce unpredictable output, are constantly updated, and the potential adversary is highly diverse: anyone with access to the internet and a decent command of natural language. Further, what constitutes a security weak in one context may not be an issue in a different context; one-fits-all guardrails remain theoretical. In this paper, we argue that it is time to rethink what constitutes ``LLM security'', and pursue a holistic approach to LLM security evaluation, where exploration and discovery of issues are central. To this end, this paper introduces garak (Generative AI Red-teaming and Assessment Kit), a framework which can be used to discover and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling