Pessimistic Value Iteration for Multi-Task Data Sharing in Offline   Reinforcement Learning

Chenjia Bai; Lingxiao Wang; Jianye Hao; Zhuoran Yang; Bin Zhao; Zhen; Wang; Xuelong Li

arXiv:2404.19346·cs.LG·May 1, 2024

Pessimistic Value Iteration for Multi-Task Data Sharing in Offline Reinforcement Learning

Chenjia Bai, Lingxiao Wang, Jianye Hao, Zhuoran Yang, Bin Zhao, Zhen, Wang, Xuelong Li

PDF

1 Repo

TL;DR

This paper introduces a pessimistic value iteration method for multi-task offline reinforcement learning that effectively shares datasets across tasks, addressing distribution shift issues and improving performance in challenging domains.

Contribution

It proposes an uncertainty-based multi-task data sharing approach with theoretical guarantees and demonstrates superior empirical results on a new benchmark.

Findings

01

Outperforms state-of-the-art methods in multi-task offline RL

02

Provides theoretical analysis linking optimality gap to data coverage

03

Introduces a new benchmark and datasets for multi-task offline RL

Abstract

Offline Reinforcement Learning (RL) has shown promising results in learning a task-specific policy from a fixed dataset. However, successful offline RL often relies heavily on the coverage and quality of the given dataset. In scenarios where the dataset for a specific task is limited, a natural approach is to improve offline RL with datasets from other tasks, namely, to conduct Multi-Task Data Sharing (MTDS). Nevertheless, directly sharing datasets from other tasks exacerbates the distribution shift in offline RL. In this paper, we propose an uncertainty-based MTDS approach that shares the entire dataset without data selection. Given ensemble-based uncertainty quantification, we perform pessimistic value iteration on the shared offline dataset, which provides a unified framework for single- and multi-task offline RL. We further provide theoretical analysis, which shows that the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

baichenjia/utds
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.