Learning Curriculum Policies for Reinforcement Learning

Sanmit Narvekar; Peter Stone

arXiv:1812.00285·cs.LG·September 17, 2019·48 cites

Learning Curriculum Policies for Reinforcement Learning

Sanmit Narvekar, Peter Stone

PDF

Open Access 1 Repo

TL;DR

This paper introduces a method to automatically learn curriculum policies for reinforcement learning by modeling the task sequencing as a Markov Decision Process, enabling faster training of agents on complex tasks.

Contribution

It extends existing curriculum design models to handle multiple transfer algorithms and demonstrates learning curriculum policies from experience.

Findings

01

Curriculum policies trained with our method accelerate agent learning.

02

Our approach outperforms existing curriculum methods in speed.

03

The method is effective across multiple domains and agents.

Abstract

Curriculum learning in reinforcement learning is a training methodology that seeks to speed up learning of a difficult target task, by first training on a series of simpler tasks and transferring the knowledge acquired to the target task. Automatically choosing a sequence of such tasks (i.e. a curriculum) is an open problem that has been the subject of much recent work in this area. In this paper, we build upon a recent method for curriculum design, which formulates the curriculum sequencing problem as a Markov Decision Process. We extend this model to handle multiple transfer learning algorithms, and show for the first time that a curriculum policy over this MDP can be learned from experience. We explore various representations that make this possible, and evaluate our approach by learning curriculum policies for multiple agents in two different domains. The results show that our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

holman57/ML-Education
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Software Engineering Research · Evolutionary Algorithms and Applications

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings