Inverse Reinforcement Learning via Matching of Optimality Profiles

Luis Haug; Ivan Ovinnikov; Eugene Bykovets

arXiv:2011.09264·cs.LG·November 20, 2020·1 cites

Inverse Reinforcement Learning via Matching of Optimality Profiles

Luis Haug, Ivan Ovinnikov, Eugene Bykovets

PDF

Open Access

TL;DR

This paper introduces a novel inverse reinforcement learning method that leverages suboptimal and heterogeneous demonstrations by matching optimality profiles, enabling reward inference with limited optimal data.

Contribution

It proposes a reward learning algorithm that uses optimality profiles and Wasserstein distance minimization, effectively handling suboptimal demonstrations.

Findings

01

The method successfully learns reward functions from suboptimal demonstrations.

02

Policies trained on these rewards outperform the original demonstrations.

03

The approach accommodates weak supervision signals like reward distributions.

Abstract

The goal of inverse reinforcement learning (IRL) is to infer a reward function that explains the behavior of an agent performing a task. The assumption that most approaches make is that the demonstrated behavior is near-optimal. In many real-world scenarios, however, examples of truly optimal behavior are scarce, and it is desirable to effectively leverage sets of demonstrations of suboptimal or heterogeneous performance, which are easier to obtain. We propose an algorithm that learns a reward function from such demonstrations together with a weak supervision signal in the form of a distribution over rewards collected during the demonstrations (or, more generally, a distribution over cumulative discounted future rewards). We view such distributions, which we also refer to as optimality profiles, as summaries of the degree of optimality of the demonstrations that may, for example,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Advanced Bandit Algorithms Research