A Q-learning algorithm for discrete-time linear-quadratic control with   random parameters of unknown distribution: convergence and stabilization

Kai Du; Qingxin Meng; and Fu Zhang

arXiv:2011.04970·math.OC·November 11, 2020·SIAM J. Control. Optim.·1 cites

A Q-learning algorithm for discrete-time linear-quadratic control with random parameters of unknown distribution: convergence and stabilization

Kai Du, Qingxin Meng, and Fu Zhang

PDF

Open Access

TL;DR

This paper introduces a Q-learning based algorithm for discrete-time linear-quadratic control problems with random parameters, demonstrating convergence and system stabilization without prior statistical knowledge.

Contribution

It develops an online iterative Q-learning algorithm for systems with unknown parameter distributions, establishing convergence and stabilization results.

Findings

01

The learning sequence converges under certain conditions.

02

The control law stabilizes the system when the problem is well-posed.

03

Numerical examples validate theoretical results.

Abstract

This paper studies an infinite horizon optimal control problem for discrete-time linear systems and quadratic criteria, both with random parameters which are independent and identically distributed with respect to time. A classical approach is to solve an algebraic Riccati equation that involves mathematical expectations and requires certain statistical information of the parameters. In this paper, we propose an online iterative algorithm in the spirit of Q-learning for the situation where only one random sample of parameters emerges at each time step. The first theorem proves the equivalence of three properties: the convergence of the learning sequence, the well-posedness of the control problem, and the solvability of the algebraic Riccati equation. The second theorem shows that the adaptive feedback control in terms of the learning sequence stabilizes the system as long as the control…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdaptive Dynamic Programming Control · Advanced Control Systems Optimization · Control Systems and Identification