Harnessing Mixed Offline Reinforcement Learning Datasets via Trajectory   Weighting

Zhang-Wei Hong; Pulkit Agrawal; R\'emi Tachet des Combes; Romain; Laroche

arXiv:2306.13085·cs.LG·June 23, 2023·1 cites

Harnessing Mixed Offline Reinforcement Learning Datasets via Trajectory Weighting

Zhang-Wei Hong, Pulkit Agrawal, R\'emi Tachet des Combes, Romain, Laroche

PDF

Open Access 1 Repo

TL;DR

This paper introduces a trajectory re-weighting method for offline reinforcement learning that enhances policy performance by better exploiting high-return trajectories in mixed datasets, applicable across various algorithms and environments.

Contribution

The paper proposes a novel re-weighting sampling strategy to improve offline RL performance by emphasizing high-return trajectories, compatible with existing algorithms.

Findings

01

Re-weighted sampling improves policy performance in mixed datasets.

02

The approach enhances exploitation of high-return trajectories.

03

Effective even in stochastic environments despite theoretical limitations.

Abstract

Most offline reinforcement learning (RL) algorithms return a target policy maximizing a trade-off between (1) the expected performance gain over the behavior policy that collected the dataset, and (2) the risk stemming from the out-of-distribution-ness of the induced state-action occupancy. It follows that the performance of the target policy is strongly related to the performance of the behavior policy and, thus, the trajectory return distribution of the dataset. We show that in mixed datasets consisting of mostly low-return trajectories and minor high-return trajectories, state-of-the-art offline RL algorithms are overly restrained by low-return trajectories and fail to exploit high-performing trajectories to the fullest. To overcome this issue, we show that, in deterministic MDPs with stochastic initial states, the dataset sampling can be re-weighted to induce an artificial dataset…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

improbable-ai/harness-offline-rl
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics

Methodsfail · Implicit Q-Learning