SQIL: Imitation Learning via Reinforcement Learning with Sparse Rewards
Siddharth Reddy, Anca D. Dragan, Sergey Levine

TL;DR
SQIL is a simple imitation learning method that uses reinforcement learning with sparse rewards to effectively imitate expert behavior without learning a reward function, outperforming behavioral cloning and competing with GAIL.
Contribution
The paper introduces SQIL, a novel RL-based imitation learning algorithm that avoids reward function learning and encourages long-horizon imitation through sparse rewards.
Findings
SQIL outperforms behavioral cloning in various tasks.
SQIL achieves competitive results compared to GAIL.
SQIL is simple to implement with minor modifications to standard RL algorithms.
Abstract
Learning to imitate expert behavior from demonstrations can be challenging, especially in environments with high-dimensional, continuous observations and unknown dynamics. Supervised learning methods based on behavioral cloning (BC) suffer from distribution shift: because the agent greedily imitates demonstrated actions, it can drift away from demonstrated states due to error accumulation. Recent methods based on reinforcement learning (RL), such as inverse RL and generative adversarial imitation learning (GAIL), overcome this issue by training an RL agent to match the demonstrations over a long horizon. Since the true reward function for the task is unknown, these methods learn a reward function from the demonstrations, often using complex and brittle approximation techniques that involve adversarial training. We propose a simple alternative that still uses RL, but does not require…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Model Reduction and Neural Networks
MethodsQ-Learning
