Value Internalization: Learning and Generalizing from Social Reward

Frieda Rong; Max Kleiman-Weiner

arXiv:2407.14681·cs.LG·July 23, 2024

Value Internalization: Learning and Generalizing from Social Reward

Frieda Rong, Max Kleiman-Weiner

PDF

Open Access 1 Repo

TL;DR

This paper introduces a model of value internalization that enables agents to learn from social rewards and generalize behaviors even without ongoing social feedback, with implications for AI alignment and understanding human development.

Contribution

The paper proposes an internal social reward (ISR) model that prevents unlearning social behaviors and facilitates generalization in out-of-distribution tasks, advancing understanding of value internalization.

Findings

01

ISR model maintains socialized behaviors during autonomous learning

02

Enables generalization in out-of-distribution tasks

03

Internalizes prosocial behavior in multi-agent environments

Abstract

Social rewards shape human behavior. During development, a caregiver guides a learner's behavior towards culturally aligned goals and values. How do these behaviors persist and generalize when the caregiver is no longer present, and the learner must continue autonomously? Here, we propose a model of value internalization where social feedback trains an internal social reward (ISR) model that generates internal rewards when social rewards are unavailable. Through empirical simulations, we show that an ISR model prevents agents from unlearning socialized behaviors and enables generalization in out-of-distribution tasks. We characterize the implications of incomplete internalization, akin to "reward hacking" on the ISR. Additionally, we show that our model internalizes prosocial behavior in a multi-agent environment. Our work provides a foundation for understanding how humans acquire and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

friedeggs/social-play
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCultural Differences and Values · Ethics in Business and Education · Experimental Behavioral Economics Studies