Model-free Low-Rank Reinforcement Learning via Leveraged Entry-wise   Matrix Estimation

Stefan Stojanovic; Yassir Jedra; Alexandre Proutiere

arXiv:2410.23434·cs.LG·November 12, 2024

Model-free Low-Rank Reinforcement Learning via Leveraged Entry-wise Matrix Estimation

Stefan Stojanovic, Yassir Jedra, Alexandre Proutiere

PDF

Open Access 1 Video

TL;DR

This paper introduces LoRa-PI, a model-free reinforcement learning algorithm that efficiently learns near-optimal policies in low-rank structured systems using a novel leverage score-based matrix estimation method.

Contribution

The paper presents a new low-rank matrix estimation technique with entry-wise guarantees that do not depend on matrix coherence, enabling order-optimal sample complexity in reinforcement learning.

Findings

01

Achieves order-optimal sample complexity in learning policies.

02

Introduces leverage score-based sampling for matrix estimation.

03

Provides guarantees independent of matrix coherence.

Abstract

We consider the problem of learning an $ε$ -optimal policy in controlled dynamical systems with low-rank latent structure. For this problem, we present LoRa-PI (Low-Rank Policy Iteration), a model-free learning algorithm alternating between policy improvement and policy evaluation steps. In the latter, the algorithm estimates the low-rank matrix corresponding to the (state, action) value function of the current policy using the following two-phase procedure. The entries of the matrix are first sampled uniformly at random to estimate, via a spectral method, the leverage scores of its rows and columns. These scores are then used to extract a few important rows and columns whose entries are further sampled. The algorithm exploits these new samples to complete the matrix estimation using a CUR-like method. For this leveraged matrix estimation procedure, we establish entry-wise…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Model-free Low-Rank Reinforcement Learning via Leveraged Entry-wise Matrix Estimation· slideslive

Taxonomy

TopicsAdvanced Adaptive Filtering Techniques · Muscle activation and electromyography studies · Elevator Systems and Control