Measuring the Authority Stack of AI Systems: Empirical Analysis of 366,120 Forced-Choice Responses Across 8 AI Models
Seulki Lee

TL;DR
This study empirically maps AI decision-making across value priorities, evidence preferences, and source trust hierarchies using a large-scale benchmark, revealing diverse and sometimes unstable authority structures in AI models.
Contribution
First large-scale empirical analysis of AI decision-making across all three layers of the Authority Stack framework using the PRISM benchmark.
Findings
Symmetric split between Universalism-first and Security-first models at L4.
Security values dominate in defense-related scenarios with near-ceiling win-rates.
Divergent evidence hierarchies observed across models, some favoring scientific evidence, others experiential.
Abstract
What values, evidence preferences, and source trust hierarchies do AI systems actually exhibit when facing structured dilemmas? We present the first large-scale empirical mapping of AI decision-making across all three layers of the Authority Stack framework (S. Lee, 2026a): value priorities (L4), evidence-type preferences (L3), and source trust hierarchies (L2). Using the PRISM benchmark -- a forced-choice instrument of 14,175 unique scenarios per layer, spanning 7 professional domains, 3 severity levels, 3 decision timeframes, and 5 scenario variants -- we evaluated 8 major AI models at temperature 0, yielding 366,120 total responses. Key findings include: (1) a symmetric 4:4 split between Universalism-first and Security-first models at L4; (2) dramatic defense-domain value restructuring where Security surges to near-ceiling win-rates (95.1%-99.8%) in 6 of 8 models; (3) divergent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
