Offline-Boosted Actor-Critic: Adaptively Blending Optimal Historical   Behaviors in Deep Off-Policy RL

Yu Luo; Tianying Ji; Fuchun Sun; Jianwei Zhang; Huazhe Xu; Xianyuan; Zhan

arXiv:2405.18520·cs.LG·May 30, 2024

Offline-Boosted Actor-Critic: Adaptively Blending Optimal Historical Behaviors in Deep Off-Policy RL

Yu Luo, Tianying Ji, Fuchun Sun, Jianwei Zhang, Huazhe Xu, Xianyuan, Zhan

PDF

Open Access 1 Repo

TL;DR

This paper introduces Offline-Boosted Actor-Critic (OBAC), a novel RL framework that adaptively leverages outperforming offline policies to enhance online learning, significantly improving sample efficiency and performance across diverse tasks.

Contribution

The paper proposes a new offline-boosted actor-critic method that adaptively identifies and utilizes superior offline policies to improve online reinforcement learning.

Findings

01

OBAC outperforms popular model-free RL baselines.

02

OBAC rivals advanced model-based RL methods.

03

OBAC demonstrates superior sample efficiency and performance across 53 tasks.

Abstract

Off-policy reinforcement learning (RL) has achieved notable success in tackling many complex real-world tasks, by leveraging previously collected data for policy learning. However, most existing off-policy RL algorithms fail to maximally exploit the information in the replay buffer, limiting sample efficiency and policy performance. In this work, we discover that concurrently training an offline RL policy based on the shared online replay buffer can sometimes outperform the original online learning policy, though the occurrence of such performance gains remains uncertain. This motivates a new possibility of harnessing the emergent outperforming offline optimal policy to improve online policy learning. Based on this insight, we present Offline-Boosted Actor-Critic (OBAC), a model-free online RL framework that elegantly identifies the outperforming offline policy through value comparison,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

roythuly/obac
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGame Theory and Applications · Opinion Dynamics and Social Influence