Discrete-to-Deep Supervised Policy Learning

Budi Kurniawan; Peter Vamplew; Michael Papasimeon; Richard Dazeley,; Cameron Foale

arXiv:2005.02057·cs.LG·May 6, 2020·1 cites

Discrete-to-Deep Supervised Policy Learning

Budi Kurniawan, Peter Vamplew, Michael Papasimeon, Richard Dazeley,, Cameron Foale

PDF

Open Access 1 Repo

TL;DR

D2D-SPL introduces a novel supervised learning approach for reinforcement learning by discretizing the state space and training a classifier, resulting in faster learning without experience replay.

Contribution

It proposes a new discretization-based supervised policy learning method that eliminates the need for experience replay and accelerates training in RL.

Findings

01

Faster learning compared to state-of-the-art methods

02

Effective in Cartpole and aircraft manoeuvring environments

03

Requires only a single agent without experience replay

Abstract

Neural networks are effective function approximators, but hard to train in the reinforcement learning (RL) context mainly because samples are correlated. For years, scholars have got around this by employing experience replay or an asynchronous parallel-agent system. This paper proposes Discrete-to-Deep Supervised Policy Learning (D2D-SPL) for training neural networks in RL. D2D-SPL discretises the continuous state space into discrete states and uses actor-critic to learn a policy. It then selects from each discrete state an input value and the action with the highest numerical preference as an input/target pair. Finally it uses input/target pairs from all discrete states to train a classifier. D2D-SPL uses a single agent, needs no experience replay and learns much faster than state-of-the-art methods. We test our method with two RL environments, the Cartpole and an aircraft manoeuvring…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

budi-kurniawan/d2d-spl
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Data Stream Mining Techniques · Smart Grid Energy Management

MethodsExperience Replay