Hindsight Preference Replay Improves Preference-Conditioned Multi-Objective Reinforcement Learning

Jonaid Shianifar; Michael Schukat; Karl Mason

arXiv:2601.11604·cs.LG·January 21, 2026

Hindsight Preference Replay Improves Preference-Conditioned Multi-Objective Reinforcement Learning

Jonaid Shianifar, Michael Schukat, Karl Mason

PDF

Open Access

TL;DR

This paper introduces Hindsight Preference Replay, a simple replay augmentation strategy that improves preference-conditioned multi-objective reinforcement learning by relabeling stored transitions with alternative preferences, leading to better performance across multiple tasks.

Contribution

The paper presents Hindsight Preference Replay, a novel method that enhances preference-conditioned RL by densifying supervision without changing existing architectures.

Findings

01

HPR improves hypervolume in 5 of 6 environments.

02

HPR increases expected utility in 4 of 6 environments.

03

Significant performance gains on complex locomotion tasks.

Abstract

Multi-objective reinforcement learning (MORL) enables agents to optimize vector-valued rewards while respecting user preferences. CAPQL, a preference-conditioned actor-critic method, achieves this by conditioning on weight vectors w and restricts data usage to the specific preferences under which it was collected, leaving off-policy data from other preferences unused. We introduce Hindsight Preference Replay (HPR), a simple and general replay augmentation strategy that retroactively relabels stored transitions with alternative preferences. This densifies supervision across the preference simplex without altering the CAPQL architecture or loss functions. Evaluated on six MO-Gymnasium locomotion tasks at a fixed 300000-step budget using expected utility (EUM), hypervolume (HV), and sparsity, HPR-CAPQL improves HV in five of six environments and EUM in four of six. On mo-humanoid-v5, for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Artificial Intelligence in Games · Action Observation and Synchronization