Activation-Space Anchored Access Control for Multi-Class Permission Reasoning in Large Language Models

Zhaopeng Zhang; Pengcheng Sun; Lan Zhang; Chen Tang; Jiewei Lai; Yunhao Wang; Hui Jin

arXiv:2601.13630·cs.CL·January 21, 2026

Activation-Space Anchored Access Control for Multi-Class Permission Reasoning in Large Language Models

Zhaopeng Zhang, Pengcheng Sun, Lan Zhang, Chen Tang, Jiewei Lai, Yunhao Wang, Hui Jin

PDF

Open Access

TL;DR

This paper introduces AAAC, a training-free method that uses activation space clustering to enforce fine-grained access control in large language models, significantly reducing permission violations.

Contribution

We propose AAAC, a novel activation-space based, training-free framework for multi-class permission control in LLMs, enabling effective access restriction without fine-tuning.

Findings

01

Reduces permission violation rates by up to 86.5%.

02

Decreases prompt-based attack success rates by 90.7%.

03

Maintains high response usability with minimal inference overhead.

Abstract

Large language models (LLMs) are increasingly deployed over knowledge bases for efficient knowledge retrieval and question answering. However, LLMs can inadvertently answer beyond a user's permission scope, leaking sensitive content, thus making it difficult to deploy knowledge-base QA under fine-grained access control requirements. In this work, we identify a geometric regularity in intermediate activations: for the same query, representations induced by different permission scopes cluster distinctly and are readily separable. Building on this separability, we propose Activation-space Anchored Access Control (AAAC), a training-free framework for multi-class permission control. AAAC constructs an anchor bank, with one permission anchor per class, from a small offline sample set and requires no fine-tuning. At inference time, a multi-anchor steering mechanism redirects each query's…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Access Control and Trust · Advanced Graph Neural Networks