Convergence results for an averaged LQR problem with applications to   reinforcement learning

Andrea Pesare; Michele Palladino; Maurizio Falcone

arXiv:2011.03447·math.OC·January 13, 2022·Math. Control. Signals Syst.

Convergence results for an averaged LQR problem with applications to reinforcement learning

Andrea Pesare, Michele Palladino, Maurizio Falcone

PDF

TL;DR

This paper proves that the optimal control derived from an averaged LQR problem with uncertain dynamics converges to the true optimal control as the agent's knowledge improves, with implications for reinforcement learning.

Contribution

It introduces a convergence analysis for averaged LQR solutions under uncertain dynamics, connecting to model-based reinforcement learning.

Findings

01

Convergence of averaged LQR control to true optimal control.

02

Theoretical validation through numerical experiments.

03

Applicability to reinforcement learning scenarios.

Abstract

In this paper, we will deal with a Linear Quadratic Optimal Control problem with unknown dynamics. As a modeling assumption, we will suppose that the knowledge that an agent has on the current system is represented by a probability distribution $π$ on the space of matrices. Furthermore, we will assume that such a probability measure is opportunely updated to take into account the increased experience that the agent obtains while exploring the environment, approximating with increasing accuracy the underlying dynamics. Under these assumptions, we will show that the optimal control obtained by solving the "average" Linear Quadratic Optimal Control problem with respect to a certain $π$ converges to the optimal control driven related to the Linear Quadratic Optimal Control problem governed by the actual, underlying dynamics. This approach is closely related to model-based Reinforcement…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.