Policy Maps: Tools for Guiding the Unbounded Space of LLM Behaviors

Michelle S. Lam; Fred Hohman; Dominik Moritz; Jeffrey P. Bigham; Kenneth Holstein; Mary Beth Kery

arXiv:2409.18203·cs.HC·August 4, 2025

Policy Maps: Tools for Guiding the Unbounded Space of LLM Behaviors

Michelle S. Lam, Fred Hohman, Dominik Moritz, Jeffrey P. Bigham, Kenneth Holstein, Mary Beth Kery

PDF

TL;DR

Policy Maps provide a novel, map-inspired approach for designing and navigating AI policies for large language models, enabling targeted control over complex behavior spaces through interactive tools.

Contribution

Introduction of policy maps and the Policy Projector tool, facilitating effective policy design and navigation in the vast behavior space of LLMs.

Findings

01

System helps policy designers address problematic behaviors

02

Supports interactive policy authoring with classification and steering

03

Assists in crafting policies for safety and bias issues

Abstract

AI policy sets boundaries on acceptable behavior for AI models, but this is challenging in the context of large language models (LLMs): how do you ensure coverage over a vast behavior space? We introduce policy maps, an approach to AI policy design inspired by the practice of physical mapmaking. Instead of aiming for full coverage, policy maps aid effective navigation through intentional design choices about which aspects to capture and which to abstract away. With Policy Projector, an interactive tool for designing LLM policy maps, an AI practitioner can survey the landscape of model input-output pairs, define custom regions (e.g., "violence"), and navigate these regions with if-then policy rules that can act on LLM outputs (e.g., if output contains "violence" and "graphic details," then rewrite without "graphic details"). Policy Projector supports interactive policy authoring using…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSparse Evolutionary Training