Beyond Uniform Sampling: Offline Reinforcement Learning with Imbalanced   Datasets

Zhang-Wei Hong; Aviral Kumar; Sathwik Karnik; Abhishek Bhandwaldar,; Akash Srivastava; Joni Pajarinen; Romain Laroche; Abhishek Gupta; Pulkit; Agrawal

arXiv:2310.04413·cs.LG·October 13, 2023·2 cites

Beyond Uniform Sampling: Offline Reinforcement Learning with Imbalanced Datasets

Zhang-Wei Hong, Aviral Kumar, Sathwik Karnik, Abhishek Bhandwaldar,, Akash Srivastava, Joni Pajarinen, Romain Laroche, Abhishek Gupta, Pulkit, Agrawal

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a novel sampling strategy for offline reinforcement learning that focuses on leveraging good data in imbalanced datasets, leading to significant performance improvements over existing methods.

Contribution

It proposes a new sampling approach that constrains policies to good data rather than all data, addressing suboptimal data dominance in offline RL.

Findings

01

Significant performance gains on 72 imbalanced datasets

02

Effective across multiple offline RL algorithms

03

Improves policy quality by focusing on good data

Abstract

Offline policy learning is aimed at learning decision-making policies using existing datasets of trajectories without collecting additional data. The primary motivation for using reinforcement learning (RL) instead of supervised learning techniques such as behavior cloning is to find a policy that achieves a higher average return than the trajectories constituting the dataset. However, we empirically find that when a dataset is dominated by suboptimal trajectories, state-of-the-art offline RL algorithms do not substantially improve over the average return of trajectories in the dataset. We argue this is due to an assumption made by current offline RL algorithms of staying close to the trajectories in the dataset. If the dataset primarily consists of sub-optimal trajectories, this assumption forces the policy to mimic the suboptimal actions. We overcome this issue by proposing a sampling…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Improbable-AI/dw-offline-rl
jaxOfficial

Videos

Beyond Uniform Sampling: Offline Reinforcement Learning with Imbalanced Datasets· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Machine Learning and Data Classification · Data Stream Mining Techniques