Aligning Diffusion Models with Noise-Conditioned Perception

Alexander Gambashidze; Anton Kulikov; Yuriy Sosnin; Ilya Makarov

arXiv:2406.17636·cs.CV·December 3, 2025

Aligning Diffusion Models with Noise-Conditioned Perception

Alexander Gambashidze, Anton Kulikov, Yuriy Sosnin, Ilya Makarov

PDF

Open Access 1 Models

TL;DR

This paper introduces a perceptual objective in the U-Net embedding space for diffusion models, improving human preference alignment, training efficiency, and visual quality compared to traditional pixel or VAE space optimization.

Contribution

It proposes a novel perceptual optimization approach in the U-Net embedding space for diffusion models, enhancing preference alignment and reducing training costs.

Findings

01

Outperforms standard latent-space methods in quality and efficiency

02

Achieves over 60% preference and visual appeal on SDXL

03

Reduces computational cost significantly during training

Abstract

Recent advancements in human preference optimization, initially developed for Language Models (LMs), have shown promise for text-to-image Diffusion Models, enhancing prompt alignment, visual appeal, and user preference. Unlike LMs, Diffusion Models typically optimize in pixel or VAE space, which does not align well with human perception, leading to slower and less efficient training during the preference alignment stage. We propose using a perceptual objective in the U-Net embedding space of the diffusion model to address these issues. Our approach involves fine-tuning Stable Diffusion 1.5 and XL using Direct Preference Optimization (DPO), Contrastive Preference Optimization (CPO), and supervised fine-tuning (SFT) within this embedding space. This method significantly outperforms standard latent-space implementations across various metrics, including quality and computational cost. For…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
alexgambashidze/SDXL_NCP-DPO_v0.1
model· ♡ 11
♡ 11

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications

Methods*Communicated@Fast*How Do I Communicate to Expedia? · Concatenated Skip Connection · Convolution · Max Pooling · ALIGN · U-Net · Diffusion