Hybrid Inverse Reinforcement Learning

Juntao Ren; Gokul Swamy; Zhiwei Steven Wu; J. Andrew Bagnell; Sanjiban; Choudhury

arXiv:2402.08848·cs.LG·June 6, 2024·2 cites

Hybrid Inverse Reinforcement Learning

Juntao Ren, Gokul Swamy, Zhiwei Steven Wu, J. Andrew Bagnell, Sanjiban, Choudhury

PDF

Open Access 1 Repo

TL;DR

This paper introduces hybrid inverse reinforcement learning, combining online and expert data to improve sample efficiency and reduce unnecessary exploration in inverse RL, without requiring environment resets.

Contribution

It proposes a novel hybrid RL approach that reduces computational costs and exploration needs in inverse RL, with formal guarantees and empirical validation.

Findings

01

Significantly more sample efficient than standard inverse RL

02

Effective on continuous control tasks

03

Maintains strong policy performance with less exploration

Abstract

The inverse reinforcement learning approach to imitation learning is a double-edged sword. On the one hand, it can enable learning from a smaller number of expert demonstrations with more robustness to error compounding than behavioral cloning approaches. On the other hand, it requires that the learner repeatedly solve a computationally expensive reinforcement learning (RL) problem. Often, much of this computation is wasted searching over policies very dissimilar to the expert's. In this work, we propose using hybrid RL -- training on a mixture of online and expert data -- to curtail unnecessary exploration. Intuitively, the expert data focuses the learner on good states during training, which reduces the amount of exploration required to compute a strong policy. Notably, such an approach doesn't need the ability to reset the learner to arbitrary states in the environment, a requirement…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jren03/garage
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdaptive Dynamic Programming Control