A model-free first-order method for linear quadratic regulator with   $\tilde{O}(1/\varepsilon)$ sampling complexity

Caleb Ju; Georgios Kotsalis; Guanghui Lan

arXiv:2212.00084·math.OC·February 21, 2025·1 cites

A model-free first-order method for linear quadratic regulator with $\tilde{O}(1/\varepsilon)$ sampling complexity

Caleb Ju, Georgios Kotsalis, Guanghui Lan

PDF

Open Access

TL;DR

This paper introduces a model-free first-order policy gradient method for stochastic LQR that achieves near-optimal sampling complexity of O(1/) without requiring all policies to be stable, advancing reinforcement learning efficiency.

Contribution

It presents a novel actor-critic algorithm for stochastic LQR with improved O(1/) sample complexity, matching model-based rates and removing stability assumptions.

Findings

01

Achieves O(1/) sample complexity for stochastic LQR.

02

Utilizes a variational inequality formulation and a stochastic primal-dual critic.

03

Demonstrates optimal convergence rates with a multi-epoch scheme.

Abstract

We consider the classic stochastic linear quadratic regulator (LQR) problem under an infinite horizon average stage cost. By leveraging recent policy gradient methods from reinforcement learning, we obtain a first-order method that finds a stable feedback law whose objective function gap to the optima is at most $ε$ with high probability using $\tilde{O} (1/ ε)$ samples, where $\tilde{O}$ hides polylogarithmic dependence on $ε$ . Our proposed method seems to have the best dependence on $ε$ within the model-free literature without the assumption that all policies generated by the algorithm are stable almost surely, and it matches the best-known rate from the model-based literature, up to logarithmic factors. The improved dependence on $ε$ is achieved by showing the accuracy scales with the variance rather than the standard deviation of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Adaptive Dynamic Programming Control