Measuring the Permission Gate: A Stress-Test Evaluation of Claude Code's Auto Mode
Zimo Ji, Zongjie Li, Wenyuan Jiang, Yudong Gao, Shuai Wang

TL;DR
This paper evaluates Claude Code's auto mode permission system under stress-test scenarios, revealing significant gaps in scope coverage and high false negative rates compared to production performance.
Contribution
It provides the first independent, detailed assessment of auto mode's scope and safety performance using a comprehensive benchmark with ambiguous authorization scenarios.
Findings
End-to-end false negative rate is 81%, much higher than 17% in production.
36.8% of actions fall outside classifier scope, affecting safety coverage.
Even evaluated actions have a 70.3% false negative rate, indicating coverage gaps.
Abstract
Claude Code's auto mode is the first deployed permission system for AI coding agents, using a two-stage transcript classifier to gate dangerous tool calls. Anthropic reports a 0.4% false positive rate and 17% false negative rate on production traffic. We present the first independent evaluation of this system on deliberately ambiguous authorization scenarios, i.e., tasks where the user's intent is clear but the target scope, blast radius, or risk level is underspecified. Using AmPermBench, a 128-prompt benchmark spanning four DevOps task families and three controlled ambiguity axes, we evaluate 253 state-changing actions at the individual action level against oracle ground truth. Our findings characterize auto mode's scope-escalation coverage under this stress-test workload. The end-to-end false negative rate is 81.0% (95% CI: 73.8%-87.4%), substantially higher than the 17% reported…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
