TL;DR
This paper introduces a pessimistic value iteration method for multi-task offline reinforcement learning that effectively shares datasets across tasks, addressing distribution shift issues and improving performance in challenging domains.
Contribution
It proposes an uncertainty-based multi-task data sharing approach with theoretical guarantees and demonstrates superior empirical results on a new benchmark.
Findings
Outperforms state-of-the-art methods in multi-task offline RL
Provides theoretical analysis linking optimality gap to data coverage
Introduces a new benchmark and datasets for multi-task offline RL
Abstract
Offline Reinforcement Learning (RL) has shown promising results in learning a task-specific policy from a fixed dataset. However, successful offline RL often relies heavily on the coverage and quality of the given dataset. In scenarios where the dataset for a specific task is limited, a natural approach is to improve offline RL with datasets from other tasks, namely, to conduct Multi-Task Data Sharing (MTDS). Nevertheless, directly sharing datasets from other tasks exacerbates the distribution shift in offline RL. In this paper, we propose an uncertainty-based MTDS approach that shares the entire dataset without data selection. Given ensemble-based uncertainty quantification, we perform pessimistic value iteration on the shared offline dataset, which provides a unified framework for single- and multi-task offline RL. We further provide theoretical analysis, which shows that the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
