Maximum Entropy Semi-Supervised Inverse Reinforcement Learning
Julien Audiffren, Michal Valko, Alessandro Lazaric, Mohammad Ghavamzadeh

TL;DR
This paper introduces MESSI, a semi-supervised extension of MaxEnt-IRL, which leverages both expert and unsupervised trajectories to improve inverse reinforcement learning performance.
Contribution
The paper proposes MESSI, a novel algorithm that integrates semi-supervised learning principles into MaxEnt-IRL using pairwise trajectory penalties.
Findings
MESSI outperforms MaxEnt-IRL when using unsupervised data.
Empirical results show improved learning in highway driving and grid-world tasks.
Unsupervised trajectories enhance IRL performance with MESSI.
Abstract
A popular approach to apprenticeship learning (AL) is to formulate it as an inverse reinforcement learning (IRL) problem. The MaxEnt-IRL algorithm successfully integrates the maximum entropy principle into IRL and unlike its predecessors, it resolves the ambiguity arising from the fact that a possibly large number of policies could match the expert's behavior. In this paper, we study an AL setting in which in addition to the expert's trajectories, a number of unsupervised trajectories is available. We introduce MESSI, a novel algorithm that combines MaxEnt-IRL with principles coming from semi-supervised learning. In particular, MESSI integrates the unsupervised data into the MaxEnt-IRL framework using a pairwise penalty on trajectories. Empirical results in a highway driving and grid-world problems indicate that MESSI is able to take advantage of the unsupervised trajectories and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
