A Reward-Free Viewpoint on Multi-Objective Reinforcement Learning

Ying-Tu Chen; Wei Hung; Bing-Shu Wu; Zhang-Wei Hong; Ping-Chun Hsieh

arXiv:2604.24532·cs.LG·April 28, 2026

A Reward-Free Viewpoint on Multi-Objective Reinforcement Learning

Ying-Tu Chen, Wei Hung, Bing-Shu Wu, Zhang-Wei Hong, Ping-Chun Hsieh

PDF

1 Video

TL;DR

This paper introduces a novel approach combining reward-free reinforcement learning with multi-objective reinforcement learning, enhancing policy learning across diverse preferences with improved efficiency.

Contribution

It systematically adapts reward-free RL to MORL, proposing a preference-guided exploration strategy and demonstrating superior performance on MO-Gymnasium tasks.

Findings

01

Outperforms state-of-the-art MORL methods in diverse tasks

02

Achieves higher data efficiency in learning policies

03

Provides the first systematic adaptation of RFRL to MORL

Abstract

Many sequential decision-making tasks involve optimizing multiple conflicting objectives, requiring policies that adapt to different user preferences. In multi-objective reinforcement learning (MORL), one widely studied approach} addresses this by training a single policy network conditioned on preference-weighted rewards. In this paper, we explore a novel algorithmic perspective: leveraging reward-free reinforcement learning (RFRL) for MORL. While RFRL has historically been studied independently of MORL, it learns optimal policies for any possible reward function, making it a natural fit for MORL's challenge of handling unknown user preferences. We propose using the RFRL's training objective as an auxiliary task to enhance MORL, enabling more effective knowledge sharing beyond the multi-objective reward function given at training time. To this end, we adapt a state-of-the-art RFRL…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

A Reward-Free Viewpoint on Multi-Objective Reinforcement Learning· slideslive