Entropy Regularised Deterministic Optimal Control: From Path Integral   Solution to Sample-Based Trajectory Optimisation

Tom Lefebvre; Guillaume Crevecoeur

arXiv:2110.02647·cs.RO·October 7, 2021

Entropy Regularised Deterministic Optimal Control: From Path Integral Solution to Sample-Based Trajectory Optimisation

Tom Lefebvre, Guillaume Crevecoeur

PDF

Open Access

TL;DR

This paper links entropy regularisation in optimization to deterministic optimal control, providing a theoretical foundation for sample-based trajectory optimization methods used in robotics with non-differentiable dynamics.

Contribution

It establishes that the optimal policy is a belief function governed by Bayesian updates, connecting sample-based methods to control as inference and justifying common heuristics.

Findings

01

The optimal policy is a belief function, not deterministic.

02

Sample-based trajectory optimization is rooted in control as inference.

03

Theoretical insights improve convergence and justify heuristics.

Abstract

Sample-based trajectory optimisers are a promising tool for the control of robotics with non-differentiable dynamics and cost functions. Contemporary approaches derive from a restricted subclass of stochastic optimal control where the optimal policy can be expressed in terms of an expectation over stochastic paths. By estimating the expectation with Monte Carlo sampling and reinterpreting the process as exploration noise, a stochastic search algorithm is obtained tailored to (deterministic) trajectory optimisation. For the purpose of future algorithmic development, it is essential to properly understand the underlying theoretical foundations that allow for a principled derivation of such methods. In this paper we make a connection between entropy regularisation in optimisation and deterministic optimal control. We then show that the optimal policy is given by a belief function rather…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Multi-Objective Optimization Algorithms · Gaussian Processes and Bayesian Inference · Advanced Bandit Algorithms Research