Social Contract AI: Aligning AI Assistants with Implicit Group Norms
Jan-Philipp Fr\"anken, Sam Kwok, Peixuan Ye, Kanishk Gandhi, Dilip, Arumugam, Jared Moore, Alex Tamkin, Tobias Gerstenberg, Noah D. Goodman

TL;DR
This paper investigates aligning AI assistants with implicit group norms by inferring user preferences from interactions, validated through economic game simulations, revealing promising alignment but challenges in robustness and generalization.
Contribution
It introduces a simulation framework for inferring user preferences from interactions to study AI alignment with implicit norms, highlighting both potential and limitations.
Findings
AI accurately matches standard economic policies
Learned policies lack robustness in out-of-distribution scenarios
Inconsistent language-policy relationships slow learning
Abstract
We explore the idea of aligning an AI assistant by inverting a model of users' (unknown) preferences from observed interactions. To validate our proposal, we run proof-of-concept simulations in the economic ultimatum game, formalizing user preferences as policies that guide the actions of simulated players. We find that the AI assistant accurately aligns its behavior to match standard policies from the economic literature (e.g., selfish, altruistic). However, the assistant's learned policies lack robustness and exhibit limited generalization in an out-of-distribution setting when confronted with a currency (e.g., grams of medicine) that was not included in the assistant's training distribution. Additionally, we find that when there is inconsistency in the relationship between language use and an unknown policy (e.g., an altruistic policy combined with rude language), the assistant's…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExperimental Behavioral Economics Studies
