Moral Change or Noise? On Problems of Aligning AI With Temporally Unstable Human Feedback
Vijay Keswani, Cyrus Cousins, Breanna Nguyen, Vincent Conitzer, Hoda Heidari, Jana Schaich Borg, and Walter Sinnott-Armstrong

TL;DR
This paper investigates how human moral preferences evolve over time and the implications for AI alignment, revealing that preferences are often unstable and that current models struggle to adapt, raising challenges for trustworthy AI in high-stakes domains.
Contribution
It provides empirical evidence of moral preference instability over time and analyzes its impact on AI alignment, emphasizing the need to account for dynamic human values.
Findings
Participants change responses 6-20% of the time across sessions.
Significant shifts observed in participants' decision models over time.
Predictive performance of AI models decreases with preference and model instability.
Abstract
Alignment methods in moral domains seek to elicit moral preferences of human stakeholders and incorporate them into AI. This presupposes moral preferences as static targets, but such preferences often evolve over time. Proper alignment of AI to dynamic human preferences should ideally account for "legitimate" changes to moral reasoning, while ignoring changes related to attention deficits, cognitive biases, or other arbitrary factors. However, common AI alignment approaches largely neglect temporal changes in preferences, posing serious challenges to proper alignment, especially in high-stakes applications of AI, e.g., in healthcare domains, where misalignment can jeopardize the trustworthiness of the system and yield serious individual and societal harms. This work investigates the extent to which people's moral preferences change over time, and the impact of such changes on AI…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsEthics and Social Impacts of AI · Explainable Artificial Intelligence (XAI) · Artificial Intelligence in Healthcare and Education
