Fast Policy Learning through Imitation and Reinforcement
Ching-An Cheng, Xinyan Yan, Nolan Wagener, Byron Boots

TL;DR
This paper introduces LOKI, a hybrid policy learning algorithm that combines imitation learning and reinforcement learning, achieving faster convergence and better performance than pure RL or IL in simulated environments.
Contribution
The paper proposes LOKI, a novel strategy that integrates IL and RL within a mirror descent framework, improving learning speed and policy quality.
Findings
LOKI outperforms pure RL in convergence speed.
LOKI can surpass suboptimal experts in performance.
Proper randomization of switching time is crucial for success.
Abstract
Imitation learning (IL) consists of a set of tools that leverage expert demonstrations to quickly learn policies. However, if the expert is suboptimal, IL can yield policies with inferior performance compared to reinforcement learning (RL). In this paper, we aim to provide an algorithm that combines the best aspects of RL and IL. We accomplish this by formulating several popular RL and IL algorithms in a common mirror descent framework, showing that these algorithms can be viewed as a variation on a single approach. We then propose LOKI, a strategy for policy learning that first performs a small but random number of IL iterations before switching to a policy gradient RL method. We show that if the switching time is properly randomized, LOKI can learn to outperform a suboptimal expert and converge faster than running policy gradient from scratch. Finally, we evaluate the performance of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Domain Adaptation and Few-Shot Learning
