Self-NPO: Data-Free Diffusion Model Enhancement via Truncated Diffusion Fine-Tuning
Fu-Yun Wang, Keqiang Sun, Yao Teng, Xihui Liu, Jiale Yuan, Jiaming Song, Hongsheng Li

TL;DR
Self-NPO introduces a data-free, efficient method for negative preference optimization in diffusion models, improving alignment with human preferences without manual data labeling.
Contribution
It proposes Self-NPO, a truncated diffusion fine-tuning technique that enables data-free negative preference optimization, reducing training costs significantly while maintaining performance.
Findings
Self-NPO achieves comparable results to data-dependent methods.
It enhances diffusion models' alignment with human preferences.
The method is highly efficient, using less than 1% of the training cost.
Abstract
Diffusion models have demonstrated remarkable success in various visual generation tasks, including image, video, and 3D content generation. Preference optimization (PO) is a prominent and growing area of research that aims to align these models with human preferences. While existing PO methods primarily concentrate on producing favorable outputs, they often overlook the significance of classifier-free guidance (CFG) in mitigating undesirable results. Diffusion-NPO addresses this gap by introducing negative preference optimization (NPO), training models to generate outputs opposite to human preferences and thereby steering them away from unfavorable outcomes through CFG. However, prior NPO approaches rely on costly and fragile procedures for obtaining explicit preference annotations (e.g., manual pairwise labeling or reward model training), limiting their practicality in domains where…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning · Reinforcement Learning in Robotics
MethodsDiffusion · Parrot optimizer: Algorithm and applications to medical problems · ALIGN
