Primal-dual policy learning for mean-field stochastic LQR problem

Xiushan Jiang; Dong Wang; Weihai Zhang; Daniel W. C. Ho; Yuanqing Wu

arXiv:2512.08205·math.OC·December 10, 2025

Primal-dual policy learning for mean-field stochastic LQR problem

Xiushan Jiang, Dong Wang, Weihai Zhang, Daniel W. C. Ho, Yuanqing Wu

PDF

Open Access

TL;DR

This paper introduces a primal-dual policy learning method for the mean-field stochastic LQR problem, combining model-free and model-based techniques to improve controller design in complex systems.

Contribution

It develops a novel primal-dual optimization framework for MF-SLQR, transforming it into a static nonconvex problem and proposing a partially model-free algorithm linked to policy iteration.

Findings

01

Validated the method with a high-dimensional example

02

Established strong duality for the MF-SLQR problem

03

Connected the approach to classical policy iteration

Abstract

Integrating data-driven techniques with mechanism-driven insights has recently gained popularity as a powerful learning approach to solving traditional LQR problems for designing intelligent controllers in complex dynamic systems. However, the theoretical understanding of various reinforcement learning algorithms needs further exploration to enhance their efficiency and safety. In this article, by means of primal-dual optimization tools, we study the partially model-free design of the mean-field stochastic LQR (MF-SLQR) controller using a policy learning approach. Firstly, by designing appropriate optimizing variables, the considered MF-SLQR problem is transformed into a new static nonconvex constrained optimization problem with equivalence preserved in certain senses. After that, the equivalent formulation of the duality results is constructed via finding the solution of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdaptive Dynamic Programming Control · Reinforcement Learning in Robotics · Extremum Seeking Control Systems