To Whom Do Language Models Align? Measuring Principal Hierarchies Under High-Stakes Competing Demands

Fangyi Yu; Nabeel Seedat; Jonathan Richard Schwarz; Andrew M. Bean

arXiv:2605.12120·cs.AI·May 13, 2026

To Whom Do Language Models Align? Measuring Principal Hierarchies Under High-Stakes Competing Demands

Fangyi Yu, Nabeel Seedat, Jonathan Richard Schwarz, Andrew M. Bean

PDF

TL;DR

This study investigates how language models prioritize conflicting demands from users, authorities, and norms in high-stakes settings, revealing frequent failures to uphold professional standards and unstable hierarchies across contexts.

Contribution

It provides empirical evidence that current models often fail to adhere to professional standards under conflicting demands, highlighting issues in alignment robustness.

Findings

01

Models frequently ignore professional standards during task execution.

02

Hierarchies between stakeholders are unstable across domains and model types.

03

Knowledge omission is a primary failure mechanism leading to harmful outputs.

Abstract

Language models deployed in high-stakes professional settings face conflicting demands from users, institutional authorities, and professional norms. How models act when these demands conflict reveals a principal hierarchy -- an implicit ordering over competing stakeholders that determines, for instance, whether a medical AI receiving a cost-reduction directive from a hospital administrator complies at the expense of evidence-based care, or refuses because professional standards require it. Across 7,136 scenarios in legal and medical domains, we test ten frontier models and find that models frequently fail to adhere to professional standards during task execution, such as drafting, when user instructions conflict with those standards -- despite adequately upholding them when users seek advisory guidance. We further find that the hierarchies between user, authority, and professional…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.