Social Contract AI: Aligning AI Assistants with Implicit Group Norms

Jan-Philipp Fr\"anken; Sam Kwok; Peixuan Ye; Kanishk Gandhi; Dilip; Arumugam; Jared Moore; Alex Tamkin; Tobias Gerstenberg; Noah D. Goodman

arXiv:2310.17769·cs.CL·December 5, 2023·1 cites

Social Contract AI: Aligning AI Assistants with Implicit Group Norms

Jan-Philipp Fr\"anken, Sam Kwok, Peixuan Ye, Kanishk Gandhi, Dilip, Arumugam, Jared Moore, Alex Tamkin, Tobias Gerstenberg, Noah D. Goodman

PDF

Open Access 1 Repo

TL;DR

This paper investigates aligning AI assistants with implicit group norms by inferring user preferences from interactions, validated through economic game simulations, revealing promising alignment but challenges in robustness and generalization.

Contribution

It introduces a simulation framework for inferring user preferences from interactions to study AI alignment with implicit norms, highlighting both potential and limitations.

Findings

01

AI accurately matches standard economic policies

02

Learned policies lack robustness in out-of-distribution scenarios

03

Inconsistent language-policy relationships slow learning

Abstract

We explore the idea of aligning an AI assistant by inverting a model of users' (unknown) preferences from observed interactions. To validate our proposal, we run proof-of-concept simulations in the economic ultimatum game, formalizing user preferences as policies that guide the actions of simulated players. We find that the AI assistant accurately aligns its behavior to match standard policies from the economic literature (e.g., selfish, altruistic). However, the assistant's learned policies lack robustness and exhibit limited generalization in an out-of-distribution setting when confronted with a currency (e.g., grams of medicine) that was not included in the assistant's training distribution. Additionally, we find that when there is inconsistency in the relationship between language use and an unknown policy (e.g., an altruistic policy combined with rude language), the assistant's…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

janphilippfranken/scai
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExperimental Behavioral Economics Studies