Learning-based primal-dual optimal control of discrete-time stochastic systems with multiplicative noise

Xiushan Jiang; Weihai Zhang

arXiv:2506.02613·math.OC·June 4, 2025

Learning-based primal-dual optimal control of discrete-time stochastic systems with multiplicative noise

Xiushan Jiang, Weihai Zhang

PDF

Open Access

TL;DR

This paper develops a primal-dual approach to solve stochastic linear quadratic regulator problems with multiplicative noise, providing theoretical insights and model-free algorithms for optimal control without full system knowledge.

Contribution

It introduces a primal-dual reformulation of SLQR with multiplicative noise, establishing strong duality and deriving algorithms based on KKT conditions for model-free optimal control.

Findings

01

The primal-dual formulation enables theoretical analysis of RL algorithms.

02

Model-free algorithms are validated through an illustrative example.

03

The approach offers a new foundation for understanding RL in stochastic control.

Abstract

Reinforcement learning (RL) is an effective approach for solving optimal control problems without knowing the exact information of the system model. However, the classical Q-learning method, a model-free RL algorithm, has its limitations, such as lack of strict theoretical analysis and the need for artificial disturbances during implementation. This paper explores the partially model-free stochastic linear quadratic regulator (SLQR) problem for a system with multiplicative noise from the primal-dual perspective to address these challenges. This approach lays a strong theoretical foundation for understanding the intrinsic mechanisms of classical RL algorithms. We reformulate the SLQR into a non-convex primal-dual optimization problem and derive a strong duality result, which enables us to provide model-based and model-free algorithms for SLQR optimal policy design based on the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Control Systems Optimization