Improved Exploration through Latent Trajectory Optimization in Deep Deterministic Policy Gradient
Kevin Sebastian Luck, Mel Vecerik, Simon Stepputtis, Heni Ben Amor,, Jonathan Scholz

TL;DR
This paper introduces a novel approach combining latent trajectory optimization with Deep Deterministic Policy Gradient (DDPG) to improve exploration efficiency in continuous control tasks, demonstrated on simulated and real-world robotic tasks.
Contribution
It extends DDPG with a learned deep dynamics model and a trajectory optimizer, creating a symbiotic system that enhances exploration and learning in image-based environments.
Findings
Enhanced exploration improves task performance.
Effective in both simulation and real-world robotic tasks.
Demonstrates benefits of model-based planning in deep RL.
Abstract
Model-free reinforcement learning algorithms such as Deep Deterministic Policy Gradient (DDPG) often require additional exploration strategies, especially if the actor is of deterministic nature. This work evaluates the use of model-based trajectory optimization methods used for exploration in Deep Deterministic Policy Gradient when trained on a latent image embedding. In addition, an extension of DDPG is derived using a value function as critic, making use of a learned deep dynamics model to compute the policy gradient. This approach leads to a symbiotic relationship between the deep reinforcement learning algorithm and the latent trajectory optimizer. The trajectory optimizer benefits from the critic learned by the RL algorithm and the latter from the enhanced exploration generated by the planner. The developed methods are evaluated on two continuous control tasks, one in simulation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsExperience Replay · Dense Connections · Weight Decay · *Communicated@Fast*How Do I Communicate to Expedia? · Adam · Convolution · Batch Normalization · Deep Deterministic Policy Gradient
