A Fully Data-Driven Value Iteration for Stochastic LQR: Convergence, Robustness and Stability
Leilei Cui, Zhong-Ping Jiang, Petter N. Kolm, Gr\'egoire G. Macqueron

TL;DR
This paper develops a fully data-driven value iteration method for stochastic LQR systems, proving its convergence, robustness, and stability without requiring system models or initial policies.
Contribution
It introduces a novel non-model-based adaptive dynamic programming algorithm with proven stability and robustness for unknown stochastic LQ systems.
Findings
Proves global exponential stability of value iteration in noise-free settings.
Shows small-disturbance input-to-state stability under external disturbances.
Demonstrates convergence and robustness through numerical experiments on data center cooling and portfolio allocation.
Abstract
Unlike traditional model-based reinforcement learning approaches that estimate system parameters from data, non-model-based data-driven control learns the optimal policy directly from input-state data without any intermediate model identification. Although this direct reinforcement learning approach offers increased adaptability and resilience to model misspecification, its reliance on raw data leaves it vulnerable to system noise and disturbances that may undermine convergence, robustness, and stability. In this article, we establish the convergence, robustness, and stability of value iteration (VI) for data-driven control of stochastic linear quadratic (LQ) systems in discrete-time with entirely unknown dynamics and cost. Our contributions are three-fold. First, we prove that VI is globally exponentially stable for any positive semidefinite initial value matrix in noise-free settings,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
