Learning from Demonstration without Demonstrations

Tom Blau; Gilad Francis; Philippe Morere

arXiv:2106.09203·cs.LG·June 18, 2021

Learning from Demonstration without Demonstrations

Tom Blau, Gilad Francis, Philippe Morere

PDF

1 Repo

TL;DR

This paper introduces P2D2, a method that automatically discovers demonstrations through planning algorithms, reducing the need for expert input and improving reinforcement learning efficiency in control tasks.

Contribution

The paper presents P2D2, a novel planning-based approach for automatic demonstration discovery that enhances RL training without requiring expert demonstrations.

Findings

01

Outperforms classic exploration RL methods in control tasks

02

Requires fewer samples for effective learning

03

Achieves better asymptotic performance

Abstract

State-of-the-art reinforcement learning (RL) algorithms suffer from high sample complexity, particularly in the sparse reward case. A popular strategy for mitigating this problem is to learn control policies by imitating a set of expert demonstrations. The drawback of such approaches is that an expert needs to produce demonstrations, which may be costly in practice. To address this shortcoming, we propose Probabilistic Planning for Demonstration Discovery (P2D2), a technique for automatically discovering demonstrations without access to an expert. We formulate discovering demonstrations as a search problem and leverage widely-used planning algorithms such as Rapidly-exploring Random Tree to find demonstration trajectories. These demonstrations are used to initialize a policy, then refined by a generic RL algorithm. We provide theoretical guarantees of P2D2 finding successful…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://gitlab.com/tomblau/p2d2
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.