Robust Offline Reinforcement Learning for Non-Markovian Decision   Processes

Ruiquan Huang; Yingbin Liang; Jing Yang

arXiv:2411.07514·cs.LG·January 7, 2025

Robust Offline Reinforcement Learning for Non-Markovian Decision Processes

Ruiquan Huang, Yingbin Liang, Jing Yang

PDF

Open Access

TL;DR

This paper introduces a new algorithm for robust offline non-Markovian reinforcement learning that leverages low-rank structures and confidence bounds to achieve sample-efficient policy learning under uncertainty.

Contribution

It proposes a novel algorithm with dataset distillation and confidence bounds, along with new dual forms and concentrability coefficients for robust non-Markovian RL.

Findings

01

Achieves $ ext{O}(1/ ext{epsilon}^2)$ sample complexity for low-rank models.

02

Extends to non-structured models with polynomial sample efficiency.

03

Provides dual forms for robust value estimation in non-Markovian settings.

Abstract

Distributionally robust offline reinforcement learning (RL) aims to find a policy that performs the best under the worst environment within an uncertainty set using an offline dataset collected from a nominal model. While recent advances in robust RL focus on Markov decision processes (MDPs), robust non-Markovian RL is limited to planning problem where the transitions in the uncertainty set are known. In this paper, we study the learning problem of robust offline non-Markovian RL. Specifically, when the nominal model admits a low-rank structure, we propose a new algorithm, featuring a novel dataset distillation and a lower confidence bound (LCB) design for robust values under different types of the uncertainty set. We also derive new dual forms for these robust values in non-Markovian RL, making our algorithm more amenable to practical implementation. By further introducing a novel…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSupply Chain and Inventory Management

MethodsSparse Evolutionary Training · Focus