Automated curriculum generation for Policy Gradients from Demonstrations
Anirudh Srinivasan, Dzmitry Bahdanau, Maxime Chevalier-Boisvert and, Yoshua Bengio

TL;DR
This paper introduces an automated curriculum generation method for reinforcement learning agents that leverages demonstrations and goal-directed training, inspired by human learning, to enhance sample efficiency in instruction-following tasks.
Contribution
It proposes a novel curriculum generation technique from demonstrations that improves RL training efficiency, inspired by human backward learning strategies.
Findings
Improved sample efficiency on BabyAI tasks
Effective use of demonstrations in curriculum learning
Outperforms PPO baseline in specific tasks
Abstract
In this paper, we present a technique that improves the process of training an agent (using RL) for instruction following. We develop a training curriculum that uses a nominal number of expert demonstrations and trains the agent in a manner that draws parallels from one of the ways in which humans learn to perform complex tasks, i.e by starting from the goal and working backwards. We test our method on the BabyAI platform and show an improvement in sample efficiency for some of its tasks compared to a PPO (proximal policy optimization) baseline.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTeaching and Learning Programming · Machine Learning and Algorithms · Intelligent Tutoring Systems and Adaptive Learning
MethodsTest · Entropy Regularization · Proximal Policy Optimization
