Convergence and sample complexity of gradient methods for the model-free linear quadratic regulator problem
Hesameddin Mohammadi, Armin Zare, Mahdi Soltanolkotabi, Mihailo R., Jovanovi\'c

TL;DR
This paper analyzes the convergence and sample complexity of gradient-based methods for the model-free linear quadratic regulator problem, providing theoretical guarantees and bounds for their efficiency in unknown systems.
Contribution
It establishes stability and convergence results for gradient flows and discretizations, and derives sample complexity bounds for model-free RL in LQR problems.
Findings
Gradient flow exhibits exponential stability over stabilizing feedbacks.
Discretized gradient descent converges with similar stability properties.
Sample complexity scales logarithmically with inverse accuracy.
Abstract
Model-free reinforcement learning attempts to find an optimal control action for an unknown dynamical system by directly searching over the parameter space of controllers. The convergence behavior and statistical properties of these approaches are often poorly understood because of the nonconvex nature of the underlying optimization problems and the lack of exact gradient computation. In this paper, we take a step towards demystifying the performance and efficiency of such methods by focusing on the standard infinite-horizon linear quadratic regulator problem for continuous-time systems with unknown state-space parameters. We establish exponential stability for the ordinary differential equation (ODE) that governs the gradient-flow dynamics over the set of stabilizing feedback gains and show that a similar result holds for the gradient descent method that arises from the forward Euler…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsRandom Search
