Beyond Static Alignment: Hierarchical Policy Control for LLM Safety via Risk-Aware Chain-of-Thought

Jianfeng Si; Lin Sun; Weihong Lin; Xiangzheng Zhang

arXiv:2602.06650·cs.CL·February 9, 2026

Beyond Static Alignment: Hierarchical Policy Control for LLM Safety via Risk-Aware Chain-of-Thought

Jianfeng Si, Lin Sun, Weihong Lin, Xiangzheng Zhang

PDF

Open Access

TL;DR

This paper introduces PACT, a hierarchical, risk-aware framework for dynamic safety control in LLMs, balancing safety and helpfulness through explicit policies and transparent decision paths.

Contribution

It proposes a novel hierarchical safety policy architecture with global and user-defined policies, enabling flexible, transparent, and effective safety management in LLMs.

Findings

01

Achieves near state-of-the-art safety performance with global policies.

02

Attains superior controllability with user-specific policies.

03

Effectively mitigates the safety-helpfulness trade-off.

Abstract

Large Language Models (LLMs) face a fundamental safety-helpfulness trade-off due to static, one-size-fits-all safety policies that lack runtime controllabilityxf, making it difficult to tailor responses to diverse application needs. %As a result, models may over-refuse benign requests or under-constrain harmful ones. We present \textbf{PACT} (Prompt-configured Action via Chain-of-Thought), a framework for dynamic safety control through explicit, risk-aware reasoning. PACT operates under a hierarchical policy architecture: a non-overridable global safety policy establishes immutable boundaries for critical risks (e.g., child safety, violent extremism), while user-defined policies can introduce domain-specific (non-global) risk categories and specify label-to-action behaviors to improve utility in real-world deployment settings. The framework decomposes safety decisions into structured…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Safety Systems Engineering in Autonomy · Ethics and Social Impacts of AI