Representation Learning for Online and Offline RL in Low-rank MDPs
Masatoshi Uehara, Xuezhou Zhang, Wen Sun

TL;DR
This paper introduces a new algorithm for representation learning in low-rank MDPs that improves sample efficiency and simplifies the process for both online and offline RL settings.
Contribution
It proposes REP-UCB, a simpler and more sample-efficient algorithm for representation learning in low-rank MDPs, applicable to online and offline RL.
Findings
REP-UCB reduces sample complexity from lambe to
It simplifies the representation learning process in RL
The offline algorithm leverages pessimism under partial coverage
Abstract
This work studies the question of Representation Learning in RL: how can we learn a compact low-dimensional representation such that on top of the representation we can perform RL procedures such as exploration and exploitation, in a sample efficient manner. We focus on the low-rank Markov Decision Processes (MDPs) where the transition dynamics correspond to a low-rank transition matrix. Unlike prior works that assume the representation is known (e.g., linear MDPs), here we need to learn the representation for the low-rank MDP. We study both the online RL and offline RL settings. For the online setting, operating with the same computational oracles used in FLAMBE (Agarwal et.al), the state-of-art algorithm for learning representations in low-rank MDPs, we propose an algorithm REP-UCB Upper Confidence Bound driven Representation learning for RL), which significantly improves the sample…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Machine Learning and Algorithms
