AlignDiff: Aligning Diverse Human Preferences via Behavior-Customisable Diffusion Model
Zibin Dong, Yifu Yuan, Jianye Hao, Fei Ni, Yao Mu, Yan Zheng, Yujing, Hu, Tangjie Lv, Changjie Fan, Zhipeng Hu

TL;DR
AlignDiff introduces a diffusion-based framework that effectively aligns agent behaviors with diverse human preferences, enabling accurate customization and flexible switching in reinforcement learning tasks.
Contribution
This work presents a novel approach combining RLHF and diffusion models to quantify, match, and switch between human preferences in agent behavior, addressing abstractness and mutability.
Findings
Superior preference matching performance
Effective behavior switching capabilities
Successful adaptation to unseen tasks
Abstract
Aligning agent behaviors with diverse human preferences remains a challenging problem in reinforcement learning (RL), owing to the inherent abstractness and mutability of human preferences. To address these issues, we propose AlignDiff, a novel framework that leverages RL from Human Feedback (RLHF) to quantify human preferences, covering abstractness, and utilizes them to guide diffusion planning for zero-shot behavior customizing, covering mutability. AlignDiff can accurately match user-customized behaviors and efficiently switch from one to another. To build the framework, we first establish the multi-perspective human feedback datasets, which contain comparisons for the attributes of diverse behaviors, and then train an attribute strength model to predict quantified relative strengths. After relabeling behavioral datasets with relative strengths, we proceed to train an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMental Health Research Topics
MethodsDiffusion
