Overcoming the Long Horizon Barrier for Sample-Efficient Reinforcement   Learning with Latent Low-Rank Structure

Tyler Sam; Yudong Chen; and Christina Lee Yu

arXiv:2206.03569·cs.LG·June 12, 2023

Overcoming the Long Horizon Barrier for Sample-Efficient Reinforcement Learning with Latent Low-Rank Structure

Tyler Sam, Yudong Chen, and Christina Lee Yu

PDF

Open Access

TL;DR

This paper introduces algorithms for sample-efficient reinforcement learning in low-rank MDPs, overcoming the long horizon barrier by leveraging structural assumptions and generative models to achieve near-optimal sample complexity.

Contribution

It demonstrates that under stronger low-rank assumptions and access to a generative model, one can attain minimax optimal sample complexity without requiring known feature mappings.

Findings

01

LR-MCPI and LR-EVI algorithms achieve optimal sample complexity

02

Without additional assumptions, learning can require exponential samples in horizon

03

Results hold for long time horizons and do not need known feature mappings

Abstract

The practicality of reinforcement learning algorithms has been limited due to poor scaling with respect to the problem size, as the sample complexity of learning an $ϵ$ -optimal policy is $\tilde{Ω} (∣ S ∣∣ A ∣ H^{3} / ϵ^{2})$ over worst case instances of an MDP with state space $S$ , action space $A$ , and horizon $H$ . We consider a class of MDPs for which the associated optimal $Q^{*}$ function is low rank, where the latent features are unknown. While one would hope to achieve linear sample complexity in $∣ S ∣$ and $∣ A ∣$ due to the low rank structure, we show that without imposing further assumptions beyond low rank of $Q^{*}$ , if one is constrained to estimate the $Q$ function using only observations from a subset of entries, there is a worst case instance in which one must incur a sample complexity exponential in the horizon $H$ to learn a near optimal policy. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Machine Learning and Algorithms · Stochastic Gradient Optimization Techniques