Stochastic Optimal Control with Side Information and Bayesian Learning
Johannes Milz, Alexander Shapiro, Enlu Zhou

TL;DR
This paper develops a Bayesian framework for infinite-horizon stochastic control problems with observable side information, providing theoretical guarantees like posterior consistency and asymptotic normality for the value function.
Contribution
It introduces a Bayesian reformulation with posterior predictive dynamics for control under unknown distributions, and proves key theoretical properties including consistency and asymptotic normality.
Findings
Posterior consistency under Markov samples.
Uniform convergence of the Bayesian value function.
Asymptotic normality of the contextual optimal value.
Abstract
We study infinite-horizon stochastic optimal control problems with observable side information: a Markov chain that modulates an unknown context-conditional randomness distribution. Since this distribution is unknown, we propose a Bayesian reformulation based on a parametric density model and posterior predictive dynamics, which yields a Bayesian Bellman equation. We prove posterior consistency under Markov samples and, under correct specification and identifiability, uniform convergence of the Bayesian value function. Finally, we establish Bernstein--von Mises-type asymptotic normality for the data-driven contextual optimal value.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Distributed Control Multi-Agent Systems
