Alignment has a Fantasia Problem
Nathanael Jo, Zoe De Simone, Mitchell Gordon, Ashia Wilson

TL;DR
This paper identifies a new class of AI interaction failures called Fantasia interactions, where AI systems fail to support users in forming and refining their goals over time.
Contribution
It introduces the concept of Fantasia interactions, critiques current alignment approaches, and proposes a multidisciplinary research agenda to improve AI support for goal formation.
Findings
Existing interventions are insufficient for Fantasia interactions.
AI should actively support users in goal refinement over time.
A multidisciplinary approach is necessary for better alignment.
Abstract
Modern AI assistants are trained to follow instructions, implicitly assuming that users can clearly articulate their goals and the kind of assistance they need. Decades of behavioral research, however, show that people often engage with AI systems before their goals are fully formed. When AI systems treat prompts as complete expressions of intent, they can appear to be useful or convenient, but not necessarily aligned with the users' needs. We call these failures Fantasia interactions. We argue that Fantasia interactions demand a rethinking of alignment research: rather than treating users as rational oracles, AI should provide cognitive support by actively helping users form and refine their intent through time. This requires an interdisciplinary approach that bridges machine learning, interface design, and behavioral science. We synthesize insights from these fields to characterize…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
