Model-Free Design of Stochastic LQR Controller from Reinforcement   Learning and Primal-Dual Optimization Perspective

Man Li; Jiahu Qin; Wei Xing Zheng; Yaonan Wang; and Yu Kang

arXiv:2103.09407·eess.SY·March 18, 2021·1 cites

Model-Free Design of Stochastic LQR Controller from Reinforcement Learning and Primal-Dual Optimization Perspective

Man Li, Jiahu Qin, Wei Xing Zheng, Yaonan Wang, and Yu Kang

PDF

Open Access

TL;DR

This paper introduces a model-free off-policy reinforcement learning algorithm for stochastic LQR control, analyzes its convergence, and connects it with primal-dual optimization, providing new insights and practical algorithms for linear systems with noise.

Contribution

It develops a novel model-free off-policy policy iteration algorithm and links primal-dual optimization with classical policy iteration for stochastic LQR control.

Findings

01

The MF-OPPI algorithm converges similarly to classical policy iteration.

02

The primal-dual approach effectively solves the non-convex LQR optimization.

03

Simulation demonstrates the effectiveness of the proposed methods.

Abstract

To further understand the underlying mechanism of various reinforcement learning (RL) algorithms and also to better use the optimization theory to make further progress in RL, many researchers begin to revisit the linear-quadratic regulator (LQR) problem, whose setting is simple and yet captures the characteristics of RL. Inspired by this, this work is concerned with the model-free design of stochastic LQR controller for linear systems subject to Gaussian noises, from the perspective of both RL and primal-dual optimization. From the RL perspective, we first develop a new model-free off-policy policy iteration (MF-OPPI) algorithm, in which the sampled data is repeatedly used for updating the policy to alleviate the data-hungry problem to some extent. We then provide a rigorous analysis for algorithm convergence by showing that the involved iterations are equivalent to the iterations in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdaptive Dynamic Programming Control · Reinforcement Learning in Robotics · Fault Detection and Control Systems