Entropy Regularised Deterministic Optimal Control: From Path Integral Solution to Sample-Based Trajectory Optimisation
Tom Lefebvre, Guillaume Crevecoeur

TL;DR
This paper links entropy regularisation in optimization to deterministic optimal control, providing a theoretical foundation for sample-based trajectory optimization methods used in robotics with non-differentiable dynamics.
Contribution
It establishes that the optimal policy is a belief function governed by Bayesian updates, connecting sample-based methods to control as inference and justifying common heuristics.
Findings
The optimal policy is a belief function, not deterministic.
Sample-based trajectory optimization is rooted in control as inference.
Theoretical insights improve convergence and justify heuristics.
Abstract
Sample-based trajectory optimisers are a promising tool for the control of robotics with non-differentiable dynamics and cost functions. Contemporary approaches derive from a restricted subclass of stochastic optimal control where the optimal policy can be expressed in terms of an expectation over stochastic paths. By estimating the expectation with Monte Carlo sampling and reinterpreting the process as exploration noise, a stochastic search algorithm is obtained tailored to (deterministic) trajectory optimisation. For the purpose of future algorithmic development, it is essential to properly understand the underlying theoretical foundations that allow for a principled derivation of such methods. In this paper we make a connection between entropy regularisation in optimisation and deterministic optimal control. We then show that the optimal policy is given by a belief function rather…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Multi-Objective Optimization Algorithms · Gaussian Processes and Bayesian Inference · Advanced Bandit Algorithms Research
