Improved Exploration through Latent Trajectory Optimization in Deep   Deterministic Policy Gradient

Kevin Sebastian Luck; Mel Vecerik; Simon Stepputtis; Heni Ben Amor,; Jonathan Scholz

arXiv:1911.06833·cs.LG·November 19, 2019

Improved Exploration through Latent Trajectory Optimization in Deep Deterministic Policy Gradient

Kevin Sebastian Luck, Mel Vecerik, Simon Stepputtis, Heni Ben Amor,, Jonathan Scholz

PDF

TL;DR

This paper introduces a novel approach combining latent trajectory optimization with Deep Deterministic Policy Gradient (DDPG) to improve exploration efficiency in continuous control tasks, demonstrated on simulated and real-world robotic tasks.

Contribution

It extends DDPG with a learned deep dynamics model and a trajectory optimizer, creating a symbiotic system that enhances exploration and learning in image-based environments.

Findings

01

Enhanced exploration improves task performance.

02

Effective in both simulation and real-world robotic tasks.

03

Demonstrates benefits of model-based planning in deep RL.

Abstract

Model-free reinforcement learning algorithms such as Deep Deterministic Policy Gradient (DDPG) often require additional exploration strategies, especially if the actor is of deterministic nature. This work evaluates the use of model-based trajectory optimization methods used for exploration in Deep Deterministic Policy Gradient when trained on a latent image embedding. In addition, an extension of DDPG is derived using a value function as critic, making use of a learned deep dynamics model to compute the policy gradient. This approach leads to a symbiotic relationship between the deep reinforcement learning algorithm and the latent trajectory optimizer. The trajectory optimizer benefits from the critic learned by the RL algorithm and the latter from the enhanced exploration generated by the planner. The developed methods are evaluated on two continuous control tasks, one in simulation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsExperience Replay · Dense Connections · Weight Decay · *Communicated@Fast*How Do I Communicate to Expedia? · Adam · Convolution · Batch Normalization · Deep Deterministic Policy Gradient