Value Internalization: Learning and Generalizing from Social Reward
Frieda Rong, Max Kleiman-Weiner

TL;DR
This paper introduces a model of value internalization that enables agents to learn from social rewards and generalize behaviors even without ongoing social feedback, with implications for AI alignment and understanding human development.
Contribution
The paper proposes an internal social reward (ISR) model that prevents unlearning social behaviors and facilitates generalization in out-of-distribution tasks, advancing understanding of value internalization.
Findings
ISR model maintains socialized behaviors during autonomous learning
Enables generalization in out-of-distribution tasks
Internalizes prosocial behavior in multi-agent environments
Abstract
Social rewards shape human behavior. During development, a caregiver guides a learner's behavior towards culturally aligned goals and values. How do these behaviors persist and generalize when the caregiver is no longer present, and the learner must continue autonomously? Here, we propose a model of value internalization where social feedback trains an internal social reward (ISR) model that generates internal rewards when social rewards are unavailable. Through empirical simulations, we show that an ISR model prevents agents from unlearning socialized behaviors and enables generalization in out-of-distribution tasks. We characterize the implications of incomplete internalization, akin to "reward hacking" on the ISR. Additionally, we show that our model internalizes prosocial behavior in a multi-agent environment. Our work provides a foundation for understanding how humans acquire and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCultural Differences and Values · Ethics in Business and Education · Experimental Behavioral Economics Studies
