Hindsight Preference Replay Improves Preference-Conditioned Multi-Objective Reinforcement Learning
Jonaid Shianifar, Michael Schukat, Karl Mason

TL;DR
This paper introduces Hindsight Preference Replay, a simple replay augmentation strategy that improves preference-conditioned multi-objective reinforcement learning by relabeling stored transitions with alternative preferences, leading to better performance across multiple tasks.
Contribution
The paper presents Hindsight Preference Replay, a novel method that enhances preference-conditioned RL by densifying supervision without changing existing architectures.
Findings
HPR improves hypervolume in 5 of 6 environments.
HPR increases expected utility in 4 of 6 environments.
Significant performance gains on complex locomotion tasks.
Abstract
Multi-objective reinforcement learning (MORL) enables agents to optimize vector-valued rewards while respecting user preferences. CAPQL, a preference-conditioned actor-critic method, achieves this by conditioning on weight vectors w and restricts data usage to the specific preferences under which it was collected, leaving off-policy data from other preferences unused. We introduce Hindsight Preference Replay (HPR), a simple and general replay augmentation strategy that retroactively relabels stored transitions with alternative preferences. This densifies supervision across the preference simplex without altering the CAPQL architecture or loss functions. Evaluated on six MO-Gymnasium locomotion tasks at a fixed 300000-step budget using expected utility (EUM), hypervolume (HV), and sparsity, HPR-CAPQL improves HV in five of six environments and EUM in four of six. On mo-humanoid-v5, for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Artificial Intelligence in Games · Action Observation and Synchronization
