Automated curriculum generation for Policy Gradients from Demonstrations

Anirudh Srinivasan; Dzmitry Bahdanau; Maxime Chevalier-Boisvert and; Yoshua Bengio

arXiv:1912.00444·cs.LG·December 3, 2019·1 cites

Automated curriculum generation for Policy Gradients from Demonstrations

Anirudh Srinivasan, Dzmitry Bahdanau, Maxime Chevalier-Boisvert and, Yoshua Bengio

PDF

Open Access 1 Repo

TL;DR

This paper introduces an automated curriculum generation method for reinforcement learning agents that leverages demonstrations and goal-directed training, inspired by human learning, to enhance sample efficiency in instruction-following tasks.

Contribution

It proposes a novel curriculum generation technique from demonstrations that improves RL training efficiency, inspired by human backward learning strategies.

Findings

01

Improved sample efficiency on BabyAI tasks

02

Effective use of demonstrations in curriculum learning

03

Outperforms PPO baseline in specific tasks

Abstract

In this paper, we present a technique that improves the process of training an agent (using RL) for instruction following. We develop a training curriculum that uses a nominal number of expert demonstrations and trains the agent in a manner that draws parallels from one of the ways in which humans learn to perform complex tasks, i.e by starting from the goal and working backwards. We test our method on the BabyAI platform and show an improvement in sample efficiency for some of its tasks compared to a PPO (proximal policy optimization) baseline.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Genius1237/babyai
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTeaching and Learning Programming · Machine Learning and Algorithms · Intelligent Tutoring Systems and Adaptive Learning

MethodsTest · Entropy Regularization · Proximal Policy Optimization