Primal-dual policy learning for mean-field stochastic LQR problem
Xiushan Jiang, Dong Wang, Weihai Zhang, Daniel W. C. Ho, Yuanqing Wu

TL;DR
This paper introduces a primal-dual policy learning method for the mean-field stochastic LQR problem, combining model-free and model-based techniques to improve controller design in complex systems.
Contribution
It develops a novel primal-dual optimization framework for MF-SLQR, transforming it into a static nonconvex problem and proposing a partially model-free algorithm linked to policy iteration.
Findings
Validated the method with a high-dimensional example
Established strong duality for the MF-SLQR problem
Connected the approach to classical policy iteration
Abstract
Integrating data-driven techniques with mechanism-driven insights has recently gained popularity as a powerful learning approach to solving traditional LQR problems for designing intelligent controllers in complex dynamic systems. However, the theoretical understanding of various reinforcement learning algorithms needs further exploration to enhance their efficiency and safety. In this article, by means of primal-dual optimization tools, we study the partially model-free design of the mean-field stochastic LQR (MF-SLQR) controller using a policy learning approach. Firstly, by designing appropriate optimizing variables, the considered MF-SLQR problem is transformed into a new static nonconvex constrained optimization problem with equivalence preserved in certain senses. After that, the equivalent formulation of the duality results is constructed via finding the solution of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdaptive Dynamic Programming Control · Reinforcement Learning in Robotics · Extremum Seeking Control Systems
