Discrete Sequential Prediction of Continuous Actions for Deep RL

Luke Metz; Julian Ibarz; Navdeep Jaitly; James Davidson

arXiv:1705.05035·cs.LG·June 11, 2019·70 cites

Discrete Sequential Prediction of Continuous Actions for Deep RL

Luke Metz, Julian Ibarz, Navdeep Jaitly, James Davidson

PDF

Open Access

TL;DR

This paper introduces a novel sequential prediction approach for discretized continuous action spaces in deep reinforcement learning, enabling effective global search and achieving state-of-the-art results.

Contribution

It proposes a new method that models high-dimensional continuous actions by predicting one dimension at a time, improving over traditional discretization.

Findings

01

Demonstrates effective global search in continuous control tasks.

02

Achieves state-of-the-art performance on several benchmarks.

03

Outperforms existing off-policy methods like DDPG.

Abstract

It has long been assumed that high dimensional continuous control problems cannot be solved effectively by discretizing individual dimensions of the action space due to the exponentially large number of bins over which policies would have to be learned. In this paper, we draw inspiration from the recent success of sequence-to-sequence models for structured prediction problems to develop policies over discretized spaces. Central to this method is the realization that complex functions over high dimensional spaces can be modeled by neural networks that predict one dimension at a time. Specifically, we show how Q-values and policies over continuous spaces can be modeled using a next step prediction model over discretized dimensions. With this parameterization, it is possible to both leverage the compositional structure of action spaces during learning, as well as compute maxima over action…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Human Pose and Action Recognition · Anomaly Detection Techniques and Applications

MethodsWeight Decay · Convolution · Adam · Dense Connections · Batch Normalization · *Communicated@Fast*How Do I Communicate to Expedia? · Experience Replay · Deep Deterministic Policy Gradient