Discrete-to-Deep Supervised Policy Learning
Budi Kurniawan, Peter Vamplew, Michael Papasimeon, Richard Dazeley,, Cameron Foale

TL;DR
D2D-SPL introduces a novel supervised learning approach for reinforcement learning by discretizing the state space and training a classifier, resulting in faster learning without experience replay.
Contribution
It proposes a new discretization-based supervised policy learning method that eliminates the need for experience replay and accelerates training in RL.
Findings
Faster learning compared to state-of-the-art methods
Effective in Cartpole and aircraft manoeuvring environments
Requires only a single agent without experience replay
Abstract
Neural networks are effective function approximators, but hard to train in the reinforcement learning (RL) context mainly because samples are correlated. For years, scholars have got around this by employing experience replay or an asynchronous parallel-agent system. This paper proposes Discrete-to-Deep Supervised Policy Learning (D2D-SPL) for training neural networks in RL. D2D-SPL discretises the continuous state space into discrete states and uses actor-critic to learn a policy. It then selects from each discrete state an input value and the action with the highest numerical preference as an input/target pair. Finally it uses input/target pairs from all discrete states to train a classifier. D2D-SPL uses a single agent, needs no experience replay and learns much faster than state-of-the-art methods. We test our method with two RL environments, the Cartpole and an aircraft manoeuvring…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Data Stream Mining Techniques · Smart Grid Energy Management
MethodsExperience Replay
