Reinforcement Learning in Low-Rank MDPs with Density Features

Audrey Huang; Jinglin Chen; Nan Jiang

arXiv:2302.02252·cs.LG·February 7, 2023

Reinforcement Learning in Low-Rank MDPs with Density Features

Audrey Huang, Jinglin Chen, Nan Jiang

PDF

Open Access 1 Video

TL;DR

This paper develops sample-efficient reinforcement learning algorithms for low-rank MDPs using density features, enabling effective occupancy estimation and exploration even with non-exploratory data and unknown features.

Contribution

It introduces novel algorithms for offline and online RL in low-rank MDPs with density features, addressing occupancy estimation and exploration challenges.

Findings

01

Algorithms handle non-exploratory data effectively.

02

Extensions to representation learning with unknown features.

03

Overcomes exponential error blow-up without strong assumptions.

Abstract

MDPs with low-rank transitions -- that is, the transition matrix can be factored into the product of two matrices, left and right -- is a highly representative structure that enables tractable learning. The left matrix enables expressive function approximation for value-based learning and has been studied extensively. In this work, we instead investigate sample-efficient learning with density features, i.e., the right matrix, which induce powerful models for state-occupancy distributions. This setting not only sheds light on leveraging unsupervised learning in RL, but also enables plug-in solutions for convex RL. In the offline setting, we propose an algorithm for off-policy estimation of occupancies that can handle non-exploratory data. Using this as a subroutine, we further devise an online algorithm that constructs exploratory data distributions in a level-by-level manner. As a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Reinforcement Learning in Low-rank MDPs with Density Features· slideslive

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Smart Grid Energy Management · Reinforcement Learning in Robotics