Augmenting Offline RL with Unlabeled Data

Zhao Wang; Briti Gangopadhyay; Jia-Fong Yeh; Shingo Takamatsu

arXiv:2406.07117·cs.AI·June 12, 2024

Augmenting Offline RL with Unlabeled Data

Zhao Wang, Briti Gangopadhyay, Jia-Fong Yeh, Shingo Takamatsu

PDF

Open Access

TL;DR

This paper introduces a novel offline RL framework that leverages a teacher-student model and policy similarity to incorporate external knowledge, effectively addressing the Out-of-Distribution issue without relying solely on dataset support.

Contribution

It proposes a new offline RL approach using a teacher-student framework and policy similarity, enabling knowledge transfer from separate datasets to improve OOD handling.

Findings

01

The method effectively incorporates external knowledge into offline RL.

02

The teacher-student framework improves policy generalization.

03

It opens new research directions for knowledge transfer in offline RL.

Abstract

Recent advancements in offline Reinforcement Learning (Offline RL) have led to an increased focus on methods based on conservative policy updates to address the Out-of-Distribution (OOD) issue. These methods typically involve adding behavior regularization or modifying the critic learning objective, focusing primarily on states or actions with substantial dataset support. However, we challenge this prevailing notion by asserting that the absence of an action or state from a dataset does not necessarily imply its suboptimality. In this paper, we propose a novel approach to tackle the OOD problem. We introduce an offline RL teacher-student framework, complemented by a policy similarity measure. This framework enables the student policy to gain insights not only from the offline RL dataset but also from the knowledge transferred by a teacher policy. The teacher policy is trained using…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques

MethodsFocus