Oracle Complexity Reduction for Model-free LQR: A Stochastic   Variance-Reduced Policy Gradient Approach

Leonardo F. Toso; Han Wang; James Anderson

arXiv:2309.10679·math.OC·September 20, 2023

Oracle Complexity Reduction for Model-free LQR: A Stochastic Variance-Reduced Policy Gradient Approach

Leonardo F. Toso, Han Wang, James Anderson

PDF

Open Access 1 Repo

TL;DR

This paper introduces an oracle-efficient stochastic variance-reduced policy gradient method for model-free LQR that significantly reduces the number of costly two-point cost queries needed to find an approximate optimal policy.

Contribution

It proposes a novel dual-loop variance-reduced algorithm combining one-point and two-point estimations, reducing the query complexity for approximate solutions.

Findings

01

Achieves $ ilde{O}( ext{log}(1/ extepsilon)^eta)$ two-point cost queries

02

Converges linearly to the optimal solution in a model-free setting

03

Reduces the cost of gradient estimation in LQR problems

Abstract

We investigate the problem of learning an $ϵ$ -approximate solution for the discrete-time Linear Quadratic Regulator (LQR) problem via a Stochastic Variance-Reduced Policy Gradient (SVRPG) approach. Whilst policy gradient methods have proven to converge linearly to the optimal solution of the model-free LQR problem, the substantial requirement for two-point cost queries in gradient estimations may be intractable, particularly in applications where obtaining cost function evaluations at two distinct control input configurations is exceptionally costly. To this end, we propose an oracle-efficient approach. Our method combines both one-point and two-point estimations in a dual-loop variance-reduced algorithm. It achieves an approximate optimal solution with only $O (lo g (1/ ϵ)^{β})$ two-point cost information for $β \in (0, 1)$ .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jd-anderson/lqr_svrpg
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Machine Learning and Algorithms · Advanced Bandit Algorithms Research