CUP: Critic-Guided Policy Reuse

Jin Zhang; Siyuan Li; Chongjie Zhang

arXiv:2210.08153·cs.AI·October 18, 2022

CUP: Critic-Guided Policy Reuse

Jin Zhang, Siyuan Li, Chongjie Zhang

PDF

Open Access 1 Repo 1 Video

TL;DR

CUP is a novel policy reuse method in deep reinforcement learning that uses the critic to guide source policy selection, leading to efficient transfer without extra training components.

Contribution

The paper introduces CUP, a critic-guided policy reuse algorithm that avoids additional training components and improves transfer efficiency in DRL.

Findings

01

CUP outperforms baseline algorithms in transfer tasks.

02

CUP achieves monotonic improvement guarantees.

03

Empirical results show significant transfer efficiency gains.

Abstract

The ability to reuse previous policies is an important aspect of human intelligence. To achieve efficient policy reuse, a Deep Reinforcement Learning (DRL) agent needs to decide when to reuse and which source policies to reuse. Previous methods solve this problem by introducing extra components to the underlying algorithm, such as hierarchical high-level policies over source policies, or estimations of source policies' value functions on the target task. However, training these components induces either optimization non-stationarity or heavy sampling cost, significantly impairing the effectiveness of transfer. To tackle this problem, we propose a novel policy reuse algorithm called Critic-gUided Policy reuse (CUP), which avoids training any extra components and efficiently reuses source policies. CUP utilizes the critic, a common component in actor-critic methods, to evaluate and choose…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nagisazj/cup
pytorchOfficial

Videos

CUP: Critic-Guided Policy Reuse· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics