Overcoming the Long Horizon Barrier for Sample-Efficient Reinforcement Learning with Latent Low-Rank Structure
Tyler Sam, Yudong Chen, and Christina Lee Yu

TL;DR
This paper introduces algorithms for sample-efficient reinforcement learning in low-rank MDPs, overcoming the long horizon barrier by leveraging structural assumptions and generative models to achieve near-optimal sample complexity.
Contribution
It demonstrates that under stronger low-rank assumptions and access to a generative model, one can attain minimax optimal sample complexity without requiring known feature mappings.
Findings
LR-MCPI and LR-EVI algorithms achieve optimal sample complexity
Without additional assumptions, learning can require exponential samples in horizon
Results hold for long time horizons and do not need known feature mappings
Abstract
The practicality of reinforcement learning algorithms has been limited due to poor scaling with respect to the problem size, as the sample complexity of learning an -optimal policy is over worst case instances of an MDP with state space , action space , and horizon . We consider a class of MDPs for which the associated optimal function is low rank, where the latent features are unknown. While one would hope to achieve linear sample complexity in and due to the low rank structure, we show that without imposing further assumptions beyond low rank of , if one is constrained to estimate the function using only observations from a subset of entries, there is a worst case instance in which one must incur a sample complexity exponential in the horizon to learn a near optimal policy. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Machine Learning and Algorithms · Stochastic Gradient Optimization Techniques
