Reinforcement Learning in Low-Rank MDPs with Density Features
Audrey Huang, Jinglin Chen, Nan Jiang

TL;DR
This paper develops sample-efficient reinforcement learning algorithms for low-rank MDPs using density features, enabling effective occupancy estimation and exploration even with non-exploratory data and unknown features.
Contribution
It introduces novel algorithms for offline and online RL in low-rank MDPs with density features, addressing occupancy estimation and exploration challenges.
Findings
Algorithms handle non-exploratory data effectively.
Extensions to representation learning with unknown features.
Overcomes exponential error blow-up without strong assumptions.
Abstract
MDPs with low-rank transitions -- that is, the transition matrix can be factored into the product of two matrices, left and right -- is a highly representative structure that enables tractable learning. The left matrix enables expressive function approximation for value-based learning and has been studied extensively. In this work, we instead investigate sample-efficient learning with density features, i.e., the right matrix, which induce powerful models for state-occupancy distributions. This setting not only sheds light on leveraging unsupervised learning in RL, but also enables plug-in solutions for convex RL. In the offline setting, we propose an algorithm for off-policy estimation of occupancies that can handle non-exploratory data. Using this as a subroutine, we further devise an online algorithm that constructs exploratory data distributions in a level-by-level manner. As a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Smart Grid Energy Management · Reinforcement Learning in Robotics
