Combining Reward Information from Multiple Sources
Dmitrii Krasheninnikov, Rohin Shah, Herke van Hoof

TL;DR
This paper addresses the challenge of combining conflicting reward signals from multiple sources by proposing a new algorithm, MIRD, that balances conservatism and informativeness in reward modeling.
Contribution
The paper introduces the Multitask Inverse Reward Design (MIRD) algorithm, a novel approach for integrating conflicting reward information and mitigating model misspecification effects.
Findings
MIRD effectively balances conservatism and informativeness.
MIRD-IF variant improves performance in conflicting reward scenarios.
Theoretical and empirical analysis demonstrates MIRD's advantages.
Abstract
Given two sources of evidence about a latent variable, one can combine the information from both by multiplying the likelihoods of each piece of evidence. However, when one or both of the observation models are misspecified, the distributions will conflict. We study this problem in the setting with two conflicting reward functions learned from different sources. In such a setting, we would like to retreat to a broader distribution over reward functions, in order to mitigate the effects of misspecification. We assume that an agent will maximize expected reward given this distribution over reward functions, and identify four desiderata for this setting. We propose a novel algorithm, Multitask Inverse Reward Design (MIRD), and compare it to a range of simple baselines. While all methods must trade off between conservatism and informativeness, through a combination of theory and empirical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Reinforcement Learning in Robotics · Adversarial Robustness in Machine Learning
