Policy Maps: Tools for Guiding the Unbounded Space of LLM Behaviors
Michelle S. Lam, Fred Hohman, Dominik Moritz, Jeffrey P. Bigham, Kenneth Holstein, Mary Beth Kery

TL;DR
Policy Maps provide a novel, map-inspired approach for designing and navigating AI policies for large language models, enabling targeted control over complex behavior spaces through interactive tools.
Contribution
Introduction of policy maps and the Policy Projector tool, facilitating effective policy design and navigation in the vast behavior space of LLMs.
Findings
System helps policy designers address problematic behaviors
Supports interactive policy authoring with classification and steering
Assists in crafting policies for safety and bias issues
Abstract
AI policy sets boundaries on acceptable behavior for AI models, but this is challenging in the context of large language models (LLMs): how do you ensure coverage over a vast behavior space? We introduce policy maps, an approach to AI policy design inspired by the practice of physical mapmaking. Instead of aiming for full coverage, policy maps aid effective navigation through intentional design choices about which aspects to capture and which to abstract away. With Policy Projector, an interactive tool for designing LLM policy maps, an AI practitioner can survey the landscape of model input-output pairs, define custom regions (e.g., "violence"), and navigate these regions with if-then policy rules that can act on LLM outputs (e.g., if output contains "violence" and "graphic details," then rewrite without "graphic details"). Policy Projector supports interactive policy authoring using…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSparse Evolutionary Training
