'Indifference' methods for managing agent rewards
Stuart Armstrong, Xavier O'Rourke

TL;DR
This paper introduces 'indifference' methods to control reward-based agents, enabling goal-specific behaviors like non-manipulation, disbelief, and seamless reward transitions in POMDPs, even with limited understanding of reward implications.
Contribution
It presents novel 'indifference' techniques for managing agent rewards, addressing manipulation, disbelief, and transition issues within POMDP frameworks.
Findings
Methods effectively prevent reward manipulation.
Techniques enable agents to behave as if certain events are impossible.
Approaches work under limited understanding of reward implications.
Abstract
`Indifference' refers to a class of methods used to control reward based agents. Indifference techniques aim to achieve one or more of three distinct goals: rewards dependent on certain events (without the agent being motivated to manipulate the probability of those events), effective disbelief (where agents behave as if particular events could never happen), and seamless transition from one reward function to another (with the agent acting as if this change is unanticipated). This paper presents several methods for achieving these goals in the POMDP setting, establishing their uses, strengths, and requirements. These methods of control work even when the implications of the agent's reward are otherwise not fully understood.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Logic, Reasoning, and Knowledge · Game Theory and Applications
