'Indifference' methods for managing agent rewards

Stuart Armstrong; Xavier O'Rourke

arXiv:1712.06365·cs.AI·June 6, 2018·5 cites

'Indifference' methods for managing agent rewards

Stuart Armstrong, Xavier O'Rourke

PDF

Open Access

TL;DR

This paper introduces 'indifference' methods to control reward-based agents, enabling goal-specific behaviors like non-manipulation, disbelief, and seamless reward transitions in POMDPs, even with limited understanding of reward implications.

Contribution

It presents novel 'indifference' techniques for managing agent rewards, addressing manipulation, disbelief, and transition issues within POMDP frameworks.

Findings

01

Methods effectively prevent reward manipulation.

02

Techniques enable agents to behave as if certain events are impossible.

03

Approaches work under limited understanding of reward implications.

Abstract

`Indifference' refers to a class of methods used to control reward based agents. Indifference techniques aim to achieve one or more of three distinct goals: rewards dependent on certain events (without the agent being motivated to manipulate the probability of those events), effective disbelief (where agents behave as if particular events could never happen), and seamless transition from one reward function to another (with the agent acting as if this change is unanticipated). This paper presents several methods for achieving these goals in the POMDP setting, establishing their uses, strengths, and requirements. These methods of control work even when the implications of the agent's reward are otherwise not fully understood.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Logic, Reasoning, and Knowledge · Game Theory and Applications