MODULI: Unlocking Preference Generalization via Diffusion Models for Offline Multi-Objective Reinforcement Learning

Yifu Yuan; Zhenrui Zheng; Zibin Dong; Jianye Hao

arXiv:2408.15501·cs.LG·May 28, 2025

MODULI: Unlocking Preference Generalization via Diffusion Models for Offline Multi-Objective Reinforcement Learning

Yifu Yuan, Zhenrui Zheng, Zibin Dong, Jianye Hao

PDF

Open Access

TL;DR

MODULI introduces a diffusion model-based approach for offline multi-objective reinforcement learning, enabling better generalization to out-of-distribution preferences and improving policy alignment with diverse objectives.

Contribution

The paper presents MODULI, a novel diffusion model framework with sliding guidance for enhanced preference generalization in offline MORL, addressing OOD preference challenges.

Findings

01

Outperforms state-of-the-art offline MORL methods on D4MORL benchmark.

02

Demonstrates strong generalization to out-of-distribution preferences.

03

Effective trajectory generation aligned with diverse and OOD preferences.

Abstract

Multi-objective Reinforcement Learning (MORL) seeks to develop policies that simultaneously optimize multiple conflicting objectives, but it requires extensive online interactions. Offline MORL provides a promising solution by training on pre-collected datasets to generalize to any preference upon deployment. However, real-world offline datasets are often conservatively and narrowly distributed, failing to comprehensively cover preferences, leading to the emergence of out-of-distribution (OOD) preference areas. Existing offline MORL algorithms exhibit poor generalization to OOD preferences, resulting in policies that do not align with preferences. Leveraging the excellent expressive and generalization capabilities of diffusion models, we propose MODULI (Multi-objective Diffusion Planner with Sliding Guidance), which employs a preference-conditioned diffusion model as a planner to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEnergy Efficiency and Management · Reinforcement Learning in Robotics · Consumer Market Behavior and Pricing

MethodsAdapter · Diffusion · ALIGN