Imitating Unknown Policies via Exploration
Nathan Gavenski, Juarez Monteiro, Roger Granada, Felipe, Meneguzzi, Rodrigo C. Barros

TL;DR
This paper introduces a two-phase exploration-based model that enhances behavioral cloning by preventing local minima and improving exploration, leading to significant performance gains across multiple environments.
Contribution
It proposes a novel two-phase model with sampling and self-attention mechanisms to improve imitation learning from unlabeled observations.
Findings
Outperforms previous state-of-the-art in four environments
Uses sampling to avoid local minima and enhance exploration
Incorporates self-attention for capturing global features
Abstract
Behavioral cloning is an imitation learning technique that teaches an agent how to behave through expert demonstrations. Recent approaches use self-supervision of fully-observable unlabeled snapshots of the states to decode state-pairs into actions. However, the iterative learning scheme from these techniques are prone to getting stuck into bad local minima. We address these limitations incorporating a two-phase model into the original framework, which learns from unlabeled observations via exploration, substantially improving traditional behavioral cloning by exploiting (i) a sampling mechanism to prevent bad local minima, (ii) a sampling mechanism to improve exploration, and (iii) self-attention modules to capture global features. The resulting technique outperforms the previous state-of-the-art in four different environments by a large margin.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Adversarial Robustness in Machine Learning
