Chain-of-Authorization: Embedding authorization into large language models
Yang Li, Yule Liu, Xinlei He, Youjian Zhao, Qi Li, Ke Xu

TL;DR
The paper introduces the Chain-of-Authorization framework, embedding access control into large language models to enhance security by internalizing authorization as part of their reasoning process.
Contribution
It presents a novel paradigm that internalizes authorization within LLMs, enabling them to generate structured authorization trajectories for secure decision-making.
Findings
CoA maintains high utility in authorized scenarios.
CoA achieves high rejection rates of unauthorized prompts.
CoA provides robust defense against adversarial attacks.
Abstract
Although Large Language Models (LLMs) have evolved from text generators into the cognitive core of modern AI systems, their inherent lack of authorization awareness exposes these systems to catastrophic risks, ranging from unintentional data leakage to unauthorized command execution. Existing defense mechanisms are fundamentally decoupled from internal reasoning, rendering them insufficient for the complex security demands of dynamic AI systems. Here, we propose the Chain-of-Authorization (CoA) framework, a paradigm that internalizes access control as a foundational cognitive capability. By systematically redesigning the input-output format and fine-tuning the model on synthesized data with complex permission topologies, CoA forces the model to generate a structured authorization trajectory as a causal prerequisite for any substantive response or action, thereby enabling LLMs to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
