Inverse Reinforcement Learning with Simultaneous Estimation of Rewards and Dynamics
Michael Herman, Tobias Gindele, J\"org Wagner, Felix Schmitt, Wolfram, Burgard

TL;DR
This paper introduces a gradient-based inverse reinforcement learning method that simultaneously estimates rewards and system dynamics, improving sample efficiency and accuracy when dynamics are unknown or partially observed.
Contribution
It proposes a novel IRL approach that jointly estimates rewards and dynamics, addressing limitations of existing methods that assume known or accessible transition models.
Findings
Enhanced sample efficiency in reward and dynamics estimation
Improved accuracy of the learned reward functions and transition models
Successful application to synthetic MDP and transfer learning tasks
Abstract
Inverse Reinforcement Learning (IRL) describes the problem of learning an unknown reward function of a Markov Decision Process (MDP) from observed behavior of an agent. Since the agent's behavior originates in its policy and MDP policies depend on both the stochastic system dynamics as well as the reward function, the solution of the inverse problem is significantly influenced by both. Current IRL approaches assume that if the transition model is unknown, additional samples from the system's dynamics are accessible, or the observed behavior provides enough samples of the system's dynamics to solve the inverse problem accurately. These assumptions are often not satisfied. To overcome this, we present a gradient-based IRL approach that simultaneously estimates the system's dynamics. By solving the combined optimization problem, our approach takes into account the bias of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
