Provably Efficient Algorithm for Nonstationary Low-Rank MDPs
Yuan Cheng, Jing Yang, Yingbin Liang

TL;DR
This paper introduces PORTAL and Ada-PORTAL algorithms for nonstationary low-rank MDPs, demonstrating their sample efficiency and ability to adaptively handle unknown representations and changing environments in reinforcement learning.
Contribution
It is the first to analyze nonstationary RL under low-rank MDPs with unknown representations, proposing adaptive algorithms with theoretical guarantees.
Findings
Both algorithms achieve small average dynamic suboptimality gap.
They are sample-efficient under mild nonstationarity.
Ada-PORTAL adaptively tunes hyper-parameters without prior knowledge.
Abstract
Reinforcement learning (RL) under changing environment models many real-world applications via nonstationary Markov Decision Processes (MDPs), and hence gains considerable interest. However, theoretical studies on nonstationary MDPs in the literature have mainly focused on tabular and linear (mixture) MDPs, which do not capture the nature of unknown representation in deep RL. In this paper, we make the first effort to investigate nonstationary RL under episodic low-rank MDPs, where both transition kernels and rewards may vary over time, and the low-rank model contains unknown representation in addition to the linear state embedding function. We first propose a parameter-dependent policy optimization algorithm called PORTAL, and further improve PORTAL to its parameter-free version of Ada-PORTAL, which is able to tune its hyper-parameters adaptively without any prior knowledge of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Age of Information Optimization · Smart Grid Energy Management
