CUP: Critic-Guided Policy Reuse
Jin Zhang, Siyuan Li, Chongjie Zhang

TL;DR
CUP is a novel policy reuse method in deep reinforcement learning that uses the critic to guide source policy selection, leading to efficient transfer without extra training components.
Contribution
The paper introduces CUP, a critic-guided policy reuse algorithm that avoids additional training components and improves transfer efficiency in DRL.
Findings
CUP outperforms baseline algorithms in transfer tasks.
CUP achieves monotonic improvement guarantees.
Empirical results show significant transfer efficiency gains.
Abstract
The ability to reuse previous policies is an important aspect of human intelligence. To achieve efficient policy reuse, a Deep Reinforcement Learning (DRL) agent needs to decide when to reuse and which source policies to reuse. Previous methods solve this problem by introducing extra components to the underlying algorithm, such as hierarchical high-level policies over source policies, or estimations of source policies' value functions on the target task. However, training these components induces either optimization non-stationarity or heavy sampling cost, significantly impairing the effectiveness of transfer. To tackle this problem, we propose a novel policy reuse algorithm called Critic-gUided Policy reuse (CUP), which avoids training any extra components and efficiently reuses source policies. CUP utilizes the critic, a common component in actor-critic methods, to evaluate and choose…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics
