A Fully Data-Driven Value Iteration for Stochastic LQR: Convergence, Robustness and Stability

Leilei Cui; Zhong-Ping Jiang; Petter N. Kolm; Gr\'egoire G. Macqueron

arXiv:2505.02970·math.OC·May 5, 2026

A Fully Data-Driven Value Iteration for Stochastic LQR: Convergence, Robustness and Stability

Leilei Cui, Zhong-Ping Jiang, Petter N. Kolm, Gr\'egoire G. Macqueron

PDF

TL;DR

This paper develops a fully data-driven value iteration method for stochastic LQR systems, proving its convergence, robustness, and stability without requiring system models or initial policies.

Contribution

It introduces a novel non-model-based adaptive dynamic programming algorithm with proven stability and robustness for unknown stochastic LQ systems.

Findings

01

Proves global exponential stability of value iteration in noise-free settings.

02

Shows small-disturbance input-to-state stability under external disturbances.

03

Demonstrates convergence and robustness through numerical experiments on data center cooling and portfolio allocation.

Abstract

Unlike traditional model-based reinforcement learning approaches that estimate system parameters from data, non-model-based data-driven control learns the optimal policy directly from input-state data without any intermediate model identification. Although this direct reinforcement learning approach offers increased adaptability and resilience to model misspecification, its reliance on raw data leaves it vulnerable to system noise and disturbances that may undermine convergence, robustness, and stability. In this article, we establish the convergence, robustness, and stability of value iteration (VI) for data-driven control of stochastic linear quadratic (LQ) systems in discrete-time with entirely unknown dynamics and cost. Our contributions are three-fold. First, we prove that VI is globally exponentially stable for any positive semidefinite initial value matrix in noise-free settings,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.