Do Coding Agents Understand Least-Privilege Authorization?
Zheng Yan, Jingxiang Weng, Charles Chen, Dengyun Peng, Ethan Qin, Jiannan Guan, Jinhao Liu, Qiming Yu, Yixin Yuan, Fanqing Meng, Carl Che, Mengkang Hu

TL;DR
This paper investigates whether current coding agents can accurately infer least-privilege permissions, introduces a new benchmark called AuthBench, and proposes a decomposition method to improve permission inference accuracy.
Contribution
It introduces AuthBench for permission-boundary inference evaluation and proposes Sufficiency-Tightness Decomposition to enhance permission accuracy in coding agents.
Findings
AuthBench reveals models often omit necessary permissions and grant excessive access.
Increasing reasoning does not fix permission mismatches, leading to model-specific failure modes.
Decomposition improves sensitive-task success by up to 15.8% and reduces attack success across models.
Abstract
As coding agents gain access to shells, repositories, and user files, least-privilege authorization becomes a prerequisite for safe deployment: an agent should receive enough authority to complete the task, without unnecessary authority that exposes sensitive surfaces. To study whether current models can infer this boundary themselves, we first introduce permission-boundary inference, where a model maps a task instruction and terminal environment to a file-level read/write/execute policy, and AuthBench, a benchmark of 120 realistic terminal tasks with human-reviewed permission labels and executable validators for utility and attack outcomes. AuthBench shows that authorization is not a simple conservative-versus-permissive calibration problem: frontier models often omit permissions required by the execution chain while also granting unused or sensitive accesses. Increasing inference-time…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
