Offline Reinforcement Learning from Datasets with Structured   Non-Stationarity

Johannes Ackermann; Takayuki Osa; Masashi Sugiyama

arXiv:2405.14114·cs.LG·May 29, 2024

Offline Reinforcement Learning from Datasets with Structured Non-Stationarity

Johannes Ackermann, Takayuki Osa, Masashi Sugiyama

PDF

Open Access 1 Repo 1 Datasets

TL;DR

This paper introduces a novel offline reinforcement learning approach that handles datasets with structured non-stationarity in transition and reward functions, improving policy learning in complex environments.

Contribution

The paper proposes a Contrastive Predictive Coding-based method to identify and adapt to structured non-stationarity in offline RL datasets, a new problem setting.

Findings

01

Performs well in simple continuous control tasks

02

Achieves oracle performance in complex tasks

03

Outperforms baseline methods

Abstract

Current Reinforcement Learning (RL) is often limited by the large amount of data needed to learn a successful policy. Offline RL aims to solve this issue by using transitions collected by a different behavior policy. We address a novel Offline RL problem setting in which, while collecting the dataset, the transition and reward functions gradually change between episodes but stay constant within each episode. We propose a method based on Contrastive Predictive Coding that identifies this non-stationarity in the offline dataset, accounts for it when training a policy, and predicts it during evaluation. We analyze our proposed method and show that it performs well in simple continuous control tasks and challenging, high-dimensional locomotion tasks. We show that our method often achieves the oracle performance and performs better than baselines.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

johannesack/offlinerlstructurednonstationarity
jaxOfficial

Datasets

johannesack/OfflineRLStructuredNonstationary
dataset· 8 dl
8 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Supply Chain and Inventory Management

MethodsInfoNCE · Contrastive Predictive Coding