Q-Pensieve: Boosting Sample Efficiency of Multi-Objective RL Through Memory Sharing of Q-Snapshots
Wei Hung, Bo-Kai Huang, Ping-Chun Hsieh, Xi Liu

TL;DR
Q-Pensieve is a novel multi-objective reinforcement learning method that enhances sample efficiency by sharing knowledge through stored Q-snapshots, outperforming existing methods on benchmark tasks.
Contribution
It introduces a Q-snapshot memory sharing approach integrated with soft policy iteration, improving sample efficiency in MORL.
Findings
Outperforms benchmark MORL methods with fewer samples
Demonstrates effective policy improvement via Q-snapshots
Provides convergence guarantees for the proposed method
Abstract
Many real-world continuous control problems are in the dilemma of weighing the pros and cons, multi-objective reinforcement learning (MORL) serves as a generic framework of learning control policies for different preferences over objectives. However, the existing MORL methods either rely on multiple passes of explicit search for finding the Pareto front and therefore are not sample-efficient, or utilizes a shared policy network for coarse knowledge sharing among policies. To boost the sample efficiency of MORL, we propose Q-Pensieve, a policy improvement scheme that stores a collection of Q-snapshots to jointly determine the policy update direction and thereby enables data sharing at the policy level. We show that Q-Pensieve can be naturally integrated with soft policy iteration with convergence guarantee. To substantiate this concept, we propose the technique of Q replay buffer, which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control
