Personalized Preference Fine-tuning of Diffusion Models

Meihua Dang; Anikait Singh; Linqi Zhou; Stefano Ermon; Jiaming Song

arXiv:2501.06655·cs.LG·January 14, 2025

Personalized Preference Fine-tuning of Diffusion Models

Meihua Dang, Anikait Singh, Linqi Zhou, Stefano Ermon, Jiaming Song

PDF

TL;DR

This paper presents PPD, a multi-reward fine-tuning method for diffusion models that personalizes image generation to individual user preferences using few-shot learning and cross-attention with preference embeddings.

Contribution

Introduces PPD, a novel multi-reward optimization framework that personalizes diffusion models by learning individual preferences from limited examples and generalizes to unseen users.

Findings

01

Achieves 76% win rate in reflecting user preferences with four examples

02

Effectively optimizes for multiple reward functions simultaneously

03

Enables interpolation between different user preferences during inference

Abstract

RLHF techniques like DPO can significantly improve the generation quality of text-to-image diffusion models. However, these methods optimize for a single reward that aligns model generation with population-level preferences, neglecting the nuances of individual users' beliefs or values. This lack of personalization limits the efficacy of these models. To bridge this gap, we introduce PPD, a multi-reward optimization objective that aligns diffusion models with personalized preferences. With PPD, a diffusion model learns the individual preferences of a population of users in a few-shot way, enabling generalization to unseen users. Specifically, our approach (1) leverages a vision-language model (VLM) to extract personal preference embeddings from a small set of pairwise preference examples, and then (2) incorporates the embeddings into diffusion models through cross attention.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsDiffusion · Direct Preference Optimization · Sparse Evolutionary Training