Imitating Unknown Policies via Exploration

Nathan Gavenski; Juarez Monteiro; Roger Granada; Felipe; Meneguzzi; Rodrigo C. Barros

arXiv:2008.05660·cs.LG·August 14, 2020·1 cites

Imitating Unknown Policies via Exploration

Nathan Gavenski, Juarez Monteiro, Roger Granada, Felipe, Meneguzzi, Rodrigo C. Barros

PDF

Open Access 2 Repos

TL;DR

This paper introduces a two-phase exploration-based model that enhances behavioral cloning by preventing local minima and improving exploration, leading to significant performance gains across multiple environments.

Contribution

It proposes a novel two-phase model with sampling and self-attention mechanisms to improve imitation learning from unlabeled observations.

Findings

01

Outperforms previous state-of-the-art in four environments

02

Uses sampling to avoid local minima and enhance exploration

03

Incorporates self-attention for capturing global features

Abstract

Behavioral cloning is an imitation learning technique that teaches an agent how to behave through expert demonstrations. Recent approaches use self-supervision of fully-observable unlabeled snapshots of the states to decode state-pairs into actions. However, the iterative learning scheme from these techniques are prone to getting stuck into bad local minima. We address these limitations incorporating a two-phase model into the original framework, which learns from unlabeled observations via exploration, substantially improving traditional behavioral cloning by exploiting (i) a sampling mechanism to prevent bad local minima, (ii) a sampling mechanism to improve exploration, and (iii) self-attention modules to capture global features. The resulting technique outperforms the previous state-of-the-art in four different environments by a large margin.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Adversarial Robustness in Machine Learning