Versatile Inverse Reinforcement Learning via Cumulative Rewards

Niklas Freymuth; Philipp Becker; Gerhard Neumann

arXiv:2111.07667·cs.LG·November 16, 2021

Versatile Inverse Reinforcement Learning via Cumulative Rewards

Niklas Freymuth, Philipp Becker, Gerhard Neumann

PDF

Open Access

TL;DR

This paper introduces a new inverse reinforcement learning method that models versatile expert behaviors by summing multiple discriminators, leading to better generalization and high-quality reward recovery.

Contribution

It proposes a novel IRL approach using a sum of discriminators to handle diverse behaviors, improving generalization over traditional uni-modal models.

Findings

01

Successfully recovers general, high-quality reward functions

02

Produces policies comparable to behavioral cloning for versatile behaviors

03

Demonstrates effectiveness on simulated tasks

Abstract

Inverse Reinforcement Learning infers a reward function from expert demonstrations, aiming to encode the behavior and intentions of the expert. Current approaches usually do this with generative and uni-modal models, meaning that they encode a single behavior. In the common setting, where there are various solutions to a problem and the experts show versatile behavior this severely limits the generalization capabilities of these methods. We propose a novel method for Inverse Reinforcement Learning that overcomes these problems by formulating the recovered reward as a sum of iteratively trained discriminators. We show on simulated tasks that our approach is able to recover general, high-quality reward functions and produces policies of the same quality as behavioral cloning approaches designed for versatile behavior.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Machine Learning and Data Classification · Adversarial Robustness in Machine Learning