Discrete Sequential Prediction of Continuous Actions for Deep RL
Luke Metz, Julian Ibarz, Navdeep Jaitly, James Davidson

TL;DR
This paper introduces a novel sequential prediction approach for discretized continuous action spaces in deep reinforcement learning, enabling effective global search and achieving state-of-the-art results.
Contribution
It proposes a new method that models high-dimensional continuous actions by predicting one dimension at a time, improving over traditional discretization.
Findings
Demonstrates effective global search in continuous control tasks.
Achieves state-of-the-art performance on several benchmarks.
Outperforms existing off-policy methods like DDPG.
Abstract
It has long been assumed that high dimensional continuous control problems cannot be solved effectively by discretizing individual dimensions of the action space due to the exponentially large number of bins over which policies would have to be learned. In this paper, we draw inspiration from the recent success of sequence-to-sequence models for structured prediction problems to develop policies over discretized spaces. Central to this method is the realization that complex functions over high dimensional spaces can be modeled by neural networks that predict one dimension at a time. Specifically, we show how Q-values and policies over continuous spaces can be modeled using a next step prediction model over discretized dimensions. With this parameterization, it is possible to both leverage the compositional structure of action spaces during learning, as well as compute maxima over action…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Human Pose and Action Recognition · Anomaly Detection Techniques and Applications
MethodsWeight Decay · Convolution · Adam · Dense Connections · Batch Normalization · *Communicated@Fast*How Do I Communicate to Expedia? · Experience Replay · Deep Deterministic Policy Gradient
