# Stochastic Inverse Reinforcement Learning

**Authors:** Ce Ju

arXiv: 1905.08513 · 2022-09-26

## TL;DR

This paper introduces stochastic inverse reinforcement learning (SIRL), a well-posed approach that estimates the probability distribution over reward functions from expert demonstrations, improving robustness and providing multiple solutions.

## Contribution

It generalizes IRL to a probabilistic framework using MCEM, offering a succinct, robust, and transferable solution that captures the intrinsic properties of IRL.

## Key findings

- Achieves good performance on the objectworld benchmark.
- Provides a global perspective on IRL's intrinsic properties.
- Generates multiple alternative solutions to IRL.

## Abstract

The goal of the inverse reinforcement learning (IRL) problem is to recover the reward functions from expert demonstrations. However, the IRL problem like any ill-posed inverse problem suffers the congenital defect that the policy may be optimal for many reward functions, and expert demonstrations may be optimal for many policies. In this work, we generalize the IRL problem to a well-posed expectation optimization problem stochastic inverse reinforcement learning (SIRL) to recover the probability distribution over reward functions. We adopt the Monte Carlo expectation-maximization (MCEM) method to estimate the parameter of the probability distribution as the first solution to the SIRL problem. The solution is succinct, robust, and transferable for a learning task and can generate alternative solutions to the IRL problem. Through our formulation, it is possible to observe the intrinsic property of the IRL problem from a global viewpoint, and our approach achieves a considerable performance on the objectworld.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1905.08513/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/1905.08513/full.md

## References

31 references — full list in the complete paper: https://tomesphere.com/paper/1905.08513/full.md

---
Source: https://tomesphere.com/paper/1905.08513