SQIL: Imitation Learning via Reinforcement Learning with Sparse Rewards

Siddharth Reddy; Anca D. Dragan; Sergey Levine

arXiv:1905.11108·cs.LG·September 27, 2019·53 cites

SQIL: Imitation Learning via Reinforcement Learning with Sparse Rewards

Siddharth Reddy, Anca D. Dragan, Sergey Levine

PDF

Open Access 5 Repos

TL;DR

SQIL is a simple imitation learning method that uses reinforcement learning with sparse rewards to effectively imitate expert behavior without learning a reward function, outperforming behavioral cloning and competing with GAIL.

Contribution

The paper introduces SQIL, a novel RL-based imitation learning algorithm that avoids reward function learning and encourages long-horizon imitation through sparse rewards.

Findings

01

SQIL outperforms behavioral cloning in various tasks.

02

SQIL achieves competitive results compared to GAIL.

03

SQIL is simple to implement with minor modifications to standard RL algorithms.

Abstract

Learning to imitate expert behavior from demonstrations can be challenging, especially in environments with high-dimensional, continuous observations and unknown dynamics. Supervised learning methods based on behavioral cloning (BC) suffer from distribution shift: because the agent greedily imitates demonstrated actions, it can drift away from demonstrated states due to error accumulation. Recent methods based on reinforcement learning (RL), such as inverse RL and generative adversarial imitation learning (GAIL), overcome this issue by training an RL agent to match the demonstrations over a long horizon. Since the true reward function for the task is unknown, these methods learn a reward function from the demonstrations, often using complex and brittle approximation techniques that involve adversarial training. We propose a simple alternative that still uses RL, but does not require…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Model Reduction and Neural Networks

MethodsQ-Learning