Episodic Bayesian Optimal Control with Unknown Randomness Distributions
Alexander Shapiro, Enlu Zhou, Yifan Lin, Yuhao Wang

TL;DR
This paper introduces an episodic Bayesian control method that learns unknown randomness distributions over episodes, converges to optimal policies, and is computationally efficient for convex problems, verified through inventory control experiments.
Contribution
The paper develops a novel episodic Bayesian control framework with convergence guarantees and an efficient SDDP-based computational method for convex stochastic control problems.
Findings
Convergence of policies to true optimal with Bayesian learning.
Asymptotic rate of $O(N^{-1/2})$ for value functions.
Effective numerical performance on inventory control.
Abstract
Stochastic optimal control with unknown randomness distributions has been studied for a long time, encompassing robust control, distributionally robust control, and adaptive control. We propose a new episodic Bayesian approach that incorporates Bayesian learning with optimal control. In each episode, the approach learns the randomness distribution with a Bayesian posterior and subsequently solves the corresponding Bayesian average estimate of the true problem. The resulting policy is exercised during the episode, while additional data/observations of the randomness are collected to update the Bayesian posterior for the next episode. We show that the resulting episodic value functions and policies converge almost surely to their optimal counterparts of the true problem if the parametrized model of the randomness distribution is correctly specified. We further show that the asymptotic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEconomic and Environmental Valuation
