Variational Dynamic Programming for Stochastic Optimal Control
Marc Lambert (SIERRA), Francis Bach (SIERRA), Silv\`ere Bonnabel, (CAOR)

TL;DR
This paper introduces a variational inference approach to stochastic optimal control, deriving a dynamic programming framework using KL divergence, and applies Gaussian approximations to nonlinear systems, generalizing LQR control.
Contribution
It presents a novel variational dynamic programming method for stochastic control using KL divergence, extending LQR to nonlinear systems with closed-form recursive updates.
Findings
Derived a dynamic programming principle based on KL divergence.
Developed Gaussian-based recursive control updates.
Successfully stabilized an inverted pendulum using the proposed method.
Abstract
We consider the problem of stochastic optimal control, where the state-feedback control policies take the form of a probability distribution and where a penalty on the entropy is added. By viewing the cost function as a Kullback- Leibler (KL) divergence between two joint distributions, we bring the tools from variational inference to bear on our optimal control problem. This allows for deriving a dynamic programming principle, where the value function is defined as a KL divergence again. We then resort to Gaussian distributions to approximate the control policies and apply the theory to control affine nonlinear systems with quadratic costs. This results in closed-form recursive updates, which generalize LQR control and the backward Riccati equation. We illustrate this novel method on the simple problem of stabilizing an inverted pendulum.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEconomic theories and models · Advanced Control Systems Optimization · Reinforcement Learning in Robotics
