Training Value-Aligned Reinforcement Learning Agents Using a Normative   Prior

Md Sultan Al Nahian; Spencer Frazier; Brent Harrison; Mark Riedl

arXiv:2104.09469·cs.LG·April 20, 2021·6 cites

Training Value-Aligned Reinforcement Learning Agents Using a Normative Prior

Md Sultan Al Nahian, Spencer Frazier, Brent Harrison, Mark Riedl

PDF

Open Access

TL;DR

This paper proposes a reinforcement learning method that incorporates a normative prior to ensure agents behave in socially acceptable ways while performing tasks, tested across three text-based environments.

Contribution

It introduces a dual-reward reinforcement learning framework combining task performance with normative behavior, leveraging a normative prior model for improved societal alignment.

Findings

01

Agents trained with normative rewards exhibit more socially acceptable behaviors.

02

The approach balances task effectiveness with normative compliance.

03

Testing across three environments demonstrates improved normative behavior.

Abstract

As more machine learning agents interact with humans, it is increasingly a prospect that an agent trained to perform a task optimally, using only a measure of task performance as feedback, can violate societal norms for acceptable behavior or cause harm. Value alignment is a property of intelligent agents wherein they solely pursue non-harmful behaviors or human-beneficial goals. We introduce an approach to value-aligned reinforcement learning, in which we train an agent with two reward signals: a standard task performance reward, plus a normative behavior reward. The normative behavior reward is derived from a value-aligned prior model previously shown to classify text as normative or non-normative. We show how variations on a policy shaping technique can balance these two sources of reward and produce policies that are both effective and perceived as being more normative. We test our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEthics and Social Impacts of AI