Maximum Entropy Differential Dynamic Programming
Oswin So, Ziyi Wang, Evangelos A. Theodorou

TL;DR
This paper introduces a maximum entropy-based extension of Differential Dynamic Programming that enables better exploration of complex cost landscapes with multiple local minima, improving solution quality.
Contribution
It proposes a novel maximum entropy formulation for DDP with unimodal and multimodal value functions, allowing escape from local minima through exploration.
Findings
Outperforms vanilla DDP on multi-minima tasks
Enables exploration with multimodal policies
Connects with linearly solvable stochastic control
Abstract
In this paper, we present a novel maximum entropy formulation of the Differential Dynamic Programming algorithm and derive two variants using unimodal and multimodal value functions parameterizations. By combining the maximum entropy Bellman equations with a particular approximation of the cost function, we are able to obtain a new formulation of Differential Dynamic Programming which is able to escape from local minima via exploration with a multimodal policy. To demonstrate the efficacy of the proposed algorithm, we provide experimental results using four systems on tasks that are represented by cost functions with multiple local minima and compare them against vanilla Differential Dynamic Programming. Furthermore, we discuss connections with previous work on the linearly solvable stochastic control framework and its extensions in relation to compositionality.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdaptive Dynamic Programming Control · Economic theories and models · Reinforcement Learning in Robotics
