A Statistical Case Against Empirical Human-AI Alignment
Julian Rodemann, Esteban Garces Arias, Christoph Luther, Christoph Jansen, Thomas Augustin

TL;DR
This paper argues that naive empirical human-AI alignment can introduce biases and suggests alternative approaches, supported by examples like human-centric language model decoding.
Contribution
It presents a principled critique of empirical alignment and proposes prescriptive and a posteriori empirical alignment as better alternatives.
Findings
Naive empirical alignment can introduce statistical biases.
Prescriptive and a posteriori approaches mitigate bias issues.
Examples include human-centric decoding of language models.
Abstract
Empirical human-AI alignment aims to make AI systems act in line with observed human behavior. While noble in its goals, we argue that empirical alignment can inadvertently introduce statistical biases that warrant caution. This position paper thus advocates against naive empirical alignment, offering prescriptive alignment and a posteriori empirical alignment as alternatives. We substantiate our principled argument by tangible examples like human-centric decoding of language models.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · AI-based Problem Solving and Planning · Explainable Artificial Intelligence (XAI)
