Representation Learning for Online and Offline RL in Low-rank MDPs

Masatoshi Uehara; Xuezhou Zhang; Wen Sun

arXiv:2110.04652·cs.LG·January 7, 2022·5 cites

Representation Learning for Online and Offline RL in Low-rank MDPs

Masatoshi Uehara, Xuezhou Zhang, Wen Sun

PDF

Open Access 1 Video

TL;DR

This paper introduces a new algorithm for representation learning in low-rank MDPs that improves sample efficiency and simplifies the process for both online and offline RL settings.

Contribution

It proposes REP-UCB, a simpler and more sample-efficient algorithm for representation learning in low-rank MDPs, applicable to online and offline RL.

Findings

01

REP-UCB reduces sample complexity from lambe to

02

It simplifies the representation learning process in RL

03

The offline algorithm leverages pessimism under partial coverage

Abstract

This work studies the question of Representation Learning in RL: how can we learn a compact low-dimensional representation such that on top of the representation we can perform RL procedures such as exploration and exploitation, in a sample efficient manner. We focus on the low-rank Markov Decision Processes (MDPs) where the transition dynamics correspond to a low-rank transition matrix. Unlike prior works that assume the representation is known (e.g., linear MDPs), here we need to learn the representation for the low-rank MDP. We study both the online RL and offline RL settings. For the online setting, operating with the same computational oracles used in FLAMBE (Agarwal et.al), the state-of-art algorithm for learning representations in low-rank MDPs, we propose an algorithm REP-UCB Upper Confidence Bound driven Representation learning for RL), which significantly improves the sample…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Representation Learning for Online and Offline RL in Low-rank MDPs· slideslive

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Machine Learning and Algorithms