Stochastic Optimal Control via Hilbert Space Embeddings of Distributions
Adam J. Thorpe, Meeko M. K. Oishi

TL;DR
This paper introduces a data-driven method for stochastic optimal control using kernel embeddings of distributions, transforming the problem into a linear program solvable without gradient-based methods.
Contribution
It applies Hilbert space embeddings to stochastic control, enabling approximate solutions via linear programming, broadening applicability to various stochastic systems.
Findings
Successfully applied to linear regulation and nonlinear tracking problems.
Avoids gradient-based optimization by using Lagrangian dual of linear program.
Demonstrates broad applicability to stochastic control scenarios.
Abstract
Kernel embeddings of distributions have recently gained significant attention in the machine learning community as a data-driven technique for representing probability distributions. Broadly, these techniques enable efficient computation of expectations by representing integral operators as elements in a reproducing kernel Hilbert space. We apply these techniques to the area of stochastic optimal control theory and present a method to compute approximately optimal policies for stochastic systems with arbitrary disturbances. Our approach reduces the optimization problem to a linear program, which can easily be solved via the Lagrangian dual, without resorting to gradient-based optimization algorithms. We focus on discrete-time dynamic programming, and demonstrate our proposed approach on a linear regulation problem, and on a nonlinear target tracking problem. This approach is broadly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGaussian Processes and Bayesian Inference · Stochastic processes and financial applications · Advanced Bandit Algorithms Research
