Spectral Entry-wise Matrix Estimation for Low-Rank Reinforcement Learning
Stefan Stojanovic, Yassir Jedra, Alexandre Proutiere

TL;DR
This paper introduces spectral entry-wise matrix estimation methods tailored for low-rank reinforcement learning problems, enabling improved algorithms for bandits and MDPs with theoretical guarantees.
Contribution
It demonstrates that spectral methods can effectively recover low-rank matrices with low entry-wise error in RL settings, leading to new algorithms with optimal performance guarantees.
Findings
Spectral methods recover singular subspaces efficiently.
Entry-wise error is nearly minimal with these methods.
Algorithms achieve state-of-the-art guarantees in RL tasks.
Abstract
We study matrix estimation problems arising in reinforcement learning (RL) with low-rank structure. In low-rank bandits, the matrix to be recovered specifies the expected arm rewards, and for low-rank Markov Decision Processes (MDPs), it may for example characterize the transition kernel of the MDP. In both cases, each entry of the matrix carries important information, and we seek estimation methods with low entry-wise error. Importantly, these methods further need to accommodate for inherent correlations in the available data (e.g. for MDPs, the data consists of system trajectories). We investigate the performance of simple spectral-based matrix estimation approaches: we show that they efficiently recover the singular subspaces of the matrix and exhibit nearly-minimal entry-wise error. These new results on low-rank matrix estimation make it possible to devise reinforcement learning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Smart Grid Energy Management · Age of Information Optimization
