Augmenting Offline RL with Unlabeled Data
Zhao Wang, Briti Gangopadhyay, Jia-Fong Yeh, Shingo Takamatsu

TL;DR
This paper introduces a novel offline RL framework that leverages a teacher-student model and policy similarity to incorporate external knowledge, effectively addressing the Out-of-Distribution issue without relying solely on dataset support.
Contribution
It proposes a new offline RL approach using a teacher-student framework and policy similarity, enabling knowledge transfer from separate datasets to improve OOD handling.
Findings
The method effectively incorporates external knowledge into offline RL.
The teacher-student framework improves policy generalization.
It opens new research directions for knowledge transfer in offline RL.
Abstract
Recent advancements in offline Reinforcement Learning (Offline RL) have led to an increased focus on methods based on conservative policy updates to address the Out-of-Distribution (OOD) issue. These methods typically involve adding behavior regularization or modifying the critic learning objective, focusing primarily on states or actions with substantial dataset support. However, we challenge this prevailing notion by asserting that the absence of an action or state from a dataset does not necessarily imply its suboptimality. In this paper, we propose a novel approach to tackle the OOD problem. We introduce an offline RL teacher-student framework, complemented by a policy similarity measure. This framework enables the student policy to gain insights not only from the offline RL dataset but also from the knowledge transferred by a teacher policy. The teacher policy is trained using…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
MethodsFocus
