Augmenting Policy Learning with Routines Discovered from a Single   Demonstration

Zelin Zhao; Chuang Gan; Jiajun Wu; Xiaoxiao Guo; Joshua B. Tenenbaum

arXiv:2012.12469·cs.LG·May 4, 2021

Augmenting Policy Learning with Routines Discovered from a Single Demonstration

Zelin Zhao, Chuang Gan, Jiajun Wu, Xiaoxiao Guo, Joshua B. Tenenbaum

PDF

Open Access 2 Repos 1 Video

TL;DR

This paper introduces RAPL, a method that discovers routines from a single demonstration to enhance policy learning, improving imitation and reinforcement learning performance and generalization across tasks.

Contribution

The paper presents a novel routine discovery approach from minimal data and integrates it into policy learning at multiple temporal scales.

Findings

01

RAPL outperforms state-of-the-art imitation learning method SQIL.

02

RAPL enhances reinforcement learning performance with A2C.

03

Discovered routines generalize to unseen levels and difficulties.

Abstract

Humans can abstract prior knowledge from very little data and use it to boost skill learning. In this paper, we propose routine-augmented policy learning (RAPL), which discovers routines composed of primitive actions from a single demonstration and uses discovered routines to augment policy learning. To discover routines from the demonstration, we first abstract routine candidates by identifying grammar over the demonstrated action trajectory. Then, the best routines measured by length and frequency are selected to form a routine library. We propose to learn policy simultaneously at primitive-level and routine-level with discovered routines, leveraging the temporal structure of routines. Our approach enables imitating expert behavior at multiple temporal scales for imitation learning and promotes reinforcement learning exploration. Extensive experiments on Atari games demonstrate that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Augmenting Policy Learning with Routines Discovered from a Single Demonstration· underline

Taxonomy

TopicsReinforcement Learning in Robotics · Explainable Artificial Intelligence (XAI) · Machine Learning and Data Classification

MethodsA2C