Beyond Preferences in AI Alignment
Tan Zhi-Xuan, Micah Carroll, Matija Franklin, Hal Ashton

TL;DR
This paper critiques the preference-based approach to AI alignment, arguing for a shift towards normative standards aligned with social roles and stakeholder consensus to better accommodate human values and diversity.
Contribution
It challenges the assumptions of preference-based AI alignment and proposes an alternative focus on normative standards suited to social roles and stakeholder negotiation.
Findings
Preferences fail to capture complex human values.
Expected utility theory is normatively silent and often inapplicable.
Alignment should focus on normative standards and stakeholder consensus.
Abstract
The dominant practice of AI alignment assumes (1) that preferences are an adequate representation of human values, (2) that human rationality can be understood in terms of maximizing the satisfaction of preferences, and (3) that AI systems should be aligned with the preferences of one or more humans to ensure that they behave safely and in accordance with our values. Whether implicitly followed or explicitly endorsed, these commitments constitute what we term a preferentist approach to AI alignment. In this paper, we characterize and challenge the preferentist approach, describing conceptual and technical alternatives that are ripe for further research. We first survey the limits of rational choice theory as a descriptive model, explaining how preferences fail to capture the thick semantic content of human values, and how utility representations neglect the possible incommensurability…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsManufacturing Process and Optimization
