TL;DR
This paper introduces a novel approach combining reward-free reinforcement learning with multi-objective reinforcement learning, enhancing policy learning across diverse preferences with improved efficiency.
Contribution
It systematically adapts reward-free RL to MORL, proposing a preference-guided exploration strategy and demonstrating superior performance on MO-Gymnasium tasks.
Findings
Outperforms state-of-the-art MORL methods in diverse tasks
Achieves higher data efficiency in learning policies
Provides the first systematic adaptation of RFRL to MORL
Abstract
Many sequential decision-making tasks involve optimizing multiple conflicting objectives, requiring policies that adapt to different user preferences. In multi-objective reinforcement learning (MORL), one widely studied approach} addresses this by training a single policy network conditioned on preference-weighted rewards. In this paper, we explore a novel algorithmic perspective: leveraging reward-free reinforcement learning (RFRL) for MORL. While RFRL has historically been studied independently of MORL, it learns optimal policies for any possible reward function, making it a natural fit for MORL's challenge of handling unknown user preferences. We propose using the RFRL's training objective as an auxiliary task to enhance MORL, enabling more effective knowledge sharing beyond the multi-objective reward function given at training time. To this end, we adapt a state-of-the-art RFRL…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
