Stochastic Optimal Control with Side Information and Bayesian Learning

Johannes Milz; Alexander Shapiro; Enlu Zhou

arXiv:2602.22047·math.OC·February 26, 2026

Stochastic Optimal Control with Side Information and Bayesian Learning

Johannes Milz, Alexander Shapiro, Enlu Zhou

PDF

Open Access

TL;DR

This paper develops a Bayesian framework for infinite-horizon stochastic control problems with observable side information, providing theoretical guarantees like posterior consistency and asymptotic normality for the value function.

Contribution

It introduces a Bayesian reformulation with posterior predictive dynamics for control under unknown distributions, and proves key theoretical properties including consistency and asymptotic normality.

Findings

01

Posterior consistency under Markov samples.

02

Uniform convergence of the Bayesian value function.

03

Asymptotic normality of the contextual optimal value.

Abstract

We study infinite-horizon stochastic optimal control problems with observable side information: a Markov chain that modulates an unknown context-conditional randomness distribution. Since this distribution is unknown, we propose a Bayesian reformulation based on a parametric density model and posterior predictive dynamics, which yields a Bayesian Bellman equation. We prove posterior consistency under Markov samples and, under correct specification and identifiability, uniform convergence of the Bayesian value function. Finally, we establish Bernstein--von Mises-type asymptotic normality for the data-driven contextual optimal value.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Distributed Control Multi-Agent Systems