FDPP: Fine-tune Diffusion Policy with Human Preference
Yuxin Chen, Devesh K. Jha, Masayoshi Tomizuka, Diego Romeres

TL;DR
FDPP introduces a method to adapt pre-trained robotic policies to new human preferences using preference-based reward learning and reinforcement learning, ensuring customization without losing original task performance.
Contribution
The paper presents FDPP, a novel approach combining preference-based reward learning with RL to fine-tune policies for personalized robotic manipulation.
Findings
FDPP effectively aligns policies with new human preferences.
Incorporating KL regularization prevents overfitting during fine-tuning.
FDPP maintains original task performance while customizing behavior.
Abstract
Imitation learning from human demonstrations enables robots to perform complex manipulation tasks and has recently witnessed huge success. However, these techniques often struggle to adapt behavior to new preferences or changes in the environment. To address these limitations, we propose Fine-tuning Diffusion Policy with Human Preference (FDPP). FDPP learns a reward function through preference-based learning. This reward is then used to fine-tune the pre-trained policy with reinforcement learning (RL), resulting in alignment of pre-trained policy with new human preferences while still solving the original task. Our experiments across various robotic tasks and preferences demonstrate that FDPP effectively customizes policy behavior without compromising performance. Additionally, we show that incorporating Kullback-Leibler (KL) regularization during fine-tuning prevents over-fitting and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsICT Impact and Policies · Merger and Competition Analysis
MethodsDiffusion
