MODULI: Unlocking Preference Generalization via Diffusion Models for Offline Multi-Objective Reinforcement Learning
Yifu Yuan, Zhenrui Zheng, Zibin Dong, Jianye Hao

TL;DR
MODULI introduces a diffusion model-based approach for offline multi-objective reinforcement learning, enabling better generalization to out-of-distribution preferences and improving policy alignment with diverse objectives.
Contribution
The paper presents MODULI, a novel diffusion model framework with sliding guidance for enhanced preference generalization in offline MORL, addressing OOD preference challenges.
Findings
Outperforms state-of-the-art offline MORL methods on D4MORL benchmark.
Demonstrates strong generalization to out-of-distribution preferences.
Effective trajectory generation aligned with diverse and OOD preferences.
Abstract
Multi-objective Reinforcement Learning (MORL) seeks to develop policies that simultaneously optimize multiple conflicting objectives, but it requires extensive online interactions. Offline MORL provides a promising solution by training on pre-collected datasets to generalize to any preference upon deployment. However, real-world offline datasets are often conservatively and narrowly distributed, failing to comprehensively cover preferences, leading to the emergence of out-of-distribution (OOD) preference areas. Existing offline MORL algorithms exhibit poor generalization to OOD preferences, resulting in policies that do not align with preferences. Leveraging the excellent expressive and generalization capabilities of diffusion models, we propose MODULI (Multi-objective Diffusion Planner with Sliding Guidance), which employs a preference-conditioned diffusion model as a planner to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEnergy Efficiency and Management · Reinforcement Learning in Robotics · Consumer Market Behavior and Pricing
MethodsAdapter · Diffusion · ALIGN
