COMPASS: A Framework for Evaluating Organization-Specific Policy Alignment in LLMs

Dasol Choi; DongGeon Lee; Brigitta Jesica Kartono; Helena Berndt; Taeyoun Kwon; Joonwon Jang; Haon Park; Hwanjo Yu; Minsuk Kahng

arXiv:2601.01836·cs.AI·January 6, 2026

COMPASS: A Framework for Evaluating Organization-Specific Policy Alignment in LLMs

Dasol Choi, DongGeon Lee, Brigitta Jesica Kartono, Helena Berndt, Taeyoun Kwon, Joonwon Jang, Haon Park, Hwanjo Yu, Minsuk Kahng

PDF

Open Access 2 Models 2 Datasets

TL;DR

COMPASS is a new framework for systematically evaluating whether large language models adhere to organization-specific policies, revealing significant gaps in their ability to enforce prohibitions in enterprise settings.

Contribution

This paper introduces COMPASS, the first comprehensive framework for assessing organization-specific policy compliance in large language models, including adversarial robustness testing.

Findings

01

Models handle legitimate requests with over 95% accuracy.

02

Models only refuse 13-40% of adversarial policy violations.

03

Current LLMs lack robustness for policy-critical enterprise deployment.

Abstract

As large language models are deployed in high-stakes enterprise applications, from healthcare to finance, ensuring adherence to organization-specific policies has become essential. Yet existing safety evaluations focus exclusively on universal harms. We present COMPASS (Company/Organization Policy Alignment Assessment), the first systematic framework for evaluating whether LLMs comply with organizational allowlist and denylist policies. We apply COMPASS to eight diverse industry scenarios, generating and validating 5,920 queries that test both routine compliance and adversarial robustness through strategically designed edge cases. Evaluating seven state-of-the-art models, we uncover a fundamental asymmetry: models reliably handle legitimate requests (>95% accuracy) but catastrophically fail at enforcing prohibitions, refusing only 13-40% of adversarial denylist violations. These results…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Ethics and Social Impacts of AI · Information and Cyber Security