Learning from Interventions using Hierarchical Policies for Safe Learning
Jing Bi, Vikas Dhiman, Tianyou Xiao, Chenliang Xu

TL;DR
This paper introduces a hierarchical policy framework for Learning from Interventions that interpolates expert reactions and predicts sub-goals, enabling safer, faster, and more effective learning in complex tasks.
Contribution
It proposes a novel hierarchical approach with sub-goal prediction and intervention interpolation to improve Learning from Interventions over traditional Behavior Cloning.
Findings
Faster training compared to LfD.
Better asymptotic performance.
Robustness to expert reaction delays.
Abstract
Learning from Demonstrations (LfD) via Behavior Cloning (BC) works well on multiple complex tasks. However, a limitation of the typical LfD approach is that it requires expert demonstrations for all scenarios, including those in which the algorithm is already well-trained. The recently proposed Learning from Interventions (LfI) overcomes this limitation by using an expert overseer. The expert overseer only intervenes when it suspects that an unsafe action is about to be taken. Although LfI significantly improves over LfD, the state-of-the-art LfI fails to account for delay caused by the expert's reaction time and only learns short-term behavior. We address these limitations by 1) interpolating the expert's interventions back in time, and 2) by splitting the policy into two hierarchical levels, one that generates sub-goals for the future and another that generates actions to reach those…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Reinforcement Learning in Robotics · Machine Learning and Data Classification
