A Non-Monolithic Policy Approach of Offline-to-Online Reinforcement   Learning

JaeYoon Kim; Junyu Xuan; Christy Liang; Farookh Hussain

arXiv:2410.23737·cs.LG·November 1, 2024

A Non-Monolithic Policy Approach of Offline-to-Online Reinforcement Learning

JaeYoon Kim, Junyu Xuan, Christy Liang, Farookh Hussain

PDF

Open Access 1 Repo

TL;DR

This paper introduces a non-monolithic exploration method for offline-to-online reinforcement learning that balances offline policy exploitation with online policy exploration, improving upon existing approaches like Policy Expansion (PEX).

Contribution

The proposed method effectively harmonizes offline exploitation and online exploration without modifying the offline policy, leading to better performance than PEX.

Findings

01

Outperforms PEX in downstream tasks

02

Balances exploitation and exploration effectively

03

Enhances data efficiency in RL

Abstract

Offline-to-online reinforcement learning (RL) leverages both pre-trained offline policies and online policies trained for downstream tasks, aiming to improve data efficiency and accelerate performance enhancement. An existing approach, Policy Expansion (PEX), utilizes a policy set composed of both policies without modifying the offline policy for exploration and learning. However, this approach fails to ensure sufficient learning of the online policy due to an excessive focus on exploration with both policies. Since the pre-trained offline policy can assist the online policy in exploiting a downstream task based on its prior experience, it should be executed effectively and tailored to the specific requirements of the downstream task. In contrast, the online policy, with its immature behavioral strategy, has the potential for exploration during the training phase. Therefore, our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jangikim2/Offline-to-online-RL-with-non-monolithic-exploration-methodology
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSmart Grid Energy Management · Supply Chain and Inventory Management · Auction Theory and Applications

MethodsSparse Evolutionary Training · Focus