Inverse Reinforcement Learning with Simultaneous Estimation of Rewards   and Dynamics

Michael Herman; Tobias Gindele; J\"org Wagner; Felix Schmitt; Wolfram; Burgard

arXiv:1604.03912·cs.AI·April 14, 2016·19 cites

Inverse Reinforcement Learning with Simultaneous Estimation of Rewards and Dynamics

Michael Herman, Tobias Gindele, J\"org Wagner, Felix Schmitt, Wolfram, Burgard

PDF

Open Access

TL;DR

This paper introduces a gradient-based inverse reinforcement learning method that simultaneously estimates rewards and system dynamics, improving sample efficiency and accuracy when dynamics are unknown or partially observed.

Contribution

It proposes a novel IRL approach that jointly estimates rewards and dynamics, addressing limitations of existing methods that assume known or accessible transition models.

Findings

01

Enhanced sample efficiency in reward and dynamics estimation

02

Improved accuracy of the learned reward functions and transition models

03

Successful application to synthetic MDP and transfer learning tasks

Abstract

Inverse Reinforcement Learning (IRL) describes the problem of learning an unknown reward function of a Markov Decision Process (MDP) from observed behavior of an agent. Since the agent's behavior originates in its policy and MDP policies depend on both the stochastic system dynamics as well as the reward function, the solution of the inverse problem is significantly influenced by both. Current IRL approaches assume that if the transition model is unknown, additional samples from the system's dynamics are accessible, or the observed behavior provides enough samples of the system's dynamics to solve the inverse problem accurately. These assumptions are often not satisfied. To overcome this, we present a gradient-based IRL approach that simultaneously estimates the system's dynamics. By solving the combined optimization problem, our approach takes into account the bias of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics