Q-Pensieve: Boosting Sample Efficiency of Multi-Objective RL Through   Memory Sharing of Q-Snapshots

Wei Hung; Bo-Kai Huang; Ping-Chun Hsieh; Xi Liu

arXiv:2212.03117·cs.LG·July 26, 2024

Q-Pensieve: Boosting Sample Efficiency of Multi-Objective RL Through Memory Sharing of Q-Snapshots

Wei Hung, Bo-Kai Huang, Ping-Chun Hsieh, Xi Liu

PDF

Open Access 1 Repo 1 Video

TL;DR

Q-Pensieve is a novel multi-objective reinforcement learning method that enhances sample efficiency by sharing knowledge through stored Q-snapshots, outperforming existing methods on benchmark tasks.

Contribution

It introduces a Q-snapshot memory sharing approach integrated with soft policy iteration, improving sample efficiency in MORL.

Findings

01

Outperforms benchmark MORL methods with fewer samples

02

Demonstrates effective policy improvement via Q-snapshots

03

Provides convergence guarantees for the proposed method

Abstract

Many real-world continuous control problems are in the dilemma of weighing the pros and cons, multi-objective reinforcement learning (MORL) serves as a generic framework of learning control policies for different preferences over objectives. However, the existing MORL methods either rely on multiple passes of explicit search for finding the Pareto front and therefore are not sample-efficient, or utilizes a shared policy network for coarse knowledge sharing among policies. To boost the sample efficiency of MORL, we propose Q-Pensieve, a policy improvement scheme that stores a collection of Q-snapshots to jointly determine the policy update direction and thereby enables data sharing at the policy level. We show that Q-Pensieve can be naturally integrated with soft policy iteration with convergence guarantee. To substantiate this concept, we propose the technique of Q replay buffer, which…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

NYCU-RL-Bandits-Lab/Q-Pensieve
pytorchOfficial

Videos

Q-Pensieve: Boosting Sample Efficiency of Multi-Objective RL Through Memory Sharing of Q-Snapshots· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control