Offline Reinforcement Learning from Datasets with Structured Non-Stationarity
Johannes Ackermann, Takayuki Osa, Masashi Sugiyama

TL;DR
This paper introduces a novel offline reinforcement learning approach that handles datasets with structured non-stationarity in transition and reward functions, improving policy learning in complex environments.
Contribution
The paper proposes a Contrastive Predictive Coding-based method to identify and adapt to structured non-stationarity in offline RL datasets, a new problem setting.
Findings
Performs well in simple continuous control tasks
Achieves oracle performance in complex tasks
Outperforms baseline methods
Abstract
Current Reinforcement Learning (RL) is often limited by the large amount of data needed to learn a successful policy. Offline RL aims to solve this issue by using transitions collected by a different behavior policy. We address a novel Offline RL problem setting in which, while collecting the dataset, the transition and reward functions gradually change between episodes but stay constant within each episode. We propose a method based on Contrastive Predictive Coding that identifies this non-stationarity in the offline dataset, accounts for it when training a policy, and predicts it during evaluation. We analyze our proposed method and show that it performs well in simple continuous control tasks and challenging, high-dimensional locomotion tasks. We show that our method often achieves the oracle performance and performs better than baselines.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Supply Chain and Inventory Management
MethodsInfoNCE · Contrastive Predictive Coding
