Versatile Inverse Reinforcement Learning via Cumulative Rewards
Niklas Freymuth, Philipp Becker, Gerhard Neumann

TL;DR
This paper introduces a new inverse reinforcement learning method that models versatile expert behaviors by summing multiple discriminators, leading to better generalization and high-quality reward recovery.
Contribution
It proposes a novel IRL approach using a sum of discriminators to handle diverse behaviors, improving generalization over traditional uni-modal models.
Findings
Successfully recovers general, high-quality reward functions
Produces policies comparable to behavioral cloning for versatile behaviors
Demonstrates effectiveness on simulated tasks
Abstract
Inverse Reinforcement Learning infers a reward function from expert demonstrations, aiming to encode the behavior and intentions of the expert. Current approaches usually do this with generative and uni-modal models, meaning that they encode a single behavior. In the common setting, where there are various solutions to a problem and the experts show versatile behavior this severely limits the generalization capabilities of these methods. We propose a novel method for Inverse Reinforcement Learning that overcomes these problems by formulating the recovered reward as a sum of iteratively trained discriminators. We show on simulated tasks that our approach is able to recover general, high-quality reward functions and produces policies of the same quality as behavioral cloning approaches designed for versatile behavior.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Machine Learning and Data Classification · Adversarial Robustness in Machine Learning
