Beyond Static Alignment: Hierarchical Policy Control for LLM Safety via Risk-Aware Chain-of-Thought
Jianfeng Si, Lin Sun, Weihong Lin, Xiangzheng Zhang

TL;DR
This paper introduces PACT, a hierarchical, risk-aware framework for dynamic safety control in LLMs, balancing safety and helpfulness through explicit policies and transparent decision paths.
Contribution
It proposes a novel hierarchical safety policy architecture with global and user-defined policies, enabling flexible, transparent, and effective safety management in LLMs.
Findings
Achieves near state-of-the-art safety performance with global policies.
Attains superior controllability with user-specific policies.
Effectively mitigates the safety-helpfulness trade-off.
Abstract
Large Language Models (LLMs) face a fundamental safety-helpfulness trade-off due to static, one-size-fits-all safety policies that lack runtime controllabilityxf, making it difficult to tailor responses to diverse application needs. %As a result, models may over-refuse benign requests or under-constrain harmful ones. We present \textbf{PACT} (Prompt-configured Action via Chain-of-Thought), a framework for dynamic safety control through explicit, risk-aware reasoning. PACT operates under a hierarchical policy architecture: a non-overridable global safety policy establishes immutable boundaries for critical risks (e.g., child safety, violent extremism), while user-defined policies can introduce domain-specific (non-global) risk categories and specify label-to-action behaviors to improve utility in real-world deployment settings. The framework decomposes safety decisions into structured…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Safety Systems Engineering in Autonomy · Ethics and Social Impacts of AI
