Robust Offline Reinforcement Learning for Non-Markovian Decision Processes
Ruiquan Huang, Yingbin Liang, Jing Yang

TL;DR
This paper introduces a new algorithm for robust offline non-Markovian reinforcement learning that leverages low-rank structures and confidence bounds to achieve sample-efficient policy learning under uncertainty.
Contribution
It proposes a novel algorithm with dataset distillation and confidence bounds, along with new dual forms and concentrability coefficients for robust non-Markovian RL.
Findings
Achieves $ ext{O}(1/ ext{epsilon}^2)$ sample complexity for low-rank models.
Extends to non-structured models with polynomial sample efficiency.
Provides dual forms for robust value estimation in non-Markovian settings.
Abstract
Distributionally robust offline reinforcement learning (RL) aims to find a policy that performs the best under the worst environment within an uncertainty set using an offline dataset collected from a nominal model. While recent advances in robust RL focus on Markov decision processes (MDPs), robust non-Markovian RL is limited to planning problem where the transitions in the uncertainty set are known. In this paper, we study the learning problem of robust offline non-Markovian RL. Specifically, when the nominal model admits a low-rank structure, we propose a new algorithm, featuring a novel dataset distillation and a lower confidence bound (LCB) design for robust values under different types of the uncertainty set. We also derive new dual forms for these robust values in non-Markovian RL, making our algorithm more amenable to practical implementation. By further introducing a novel…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSupply Chain and Inventory Management
MethodsSparse Evolutionary Training · Focus
