Diffusion Blend: Inference-Time Multi-Preference Alignment for Diffusion Models

Min Cheng; Fatemeh Doudi; Dileep Kalathil; Mohammad Ghavamzadeh; Panganamala R. Kumar

arXiv:2505.18547·cs.AI·March 13, 2026

Diffusion Blend: Inference-Time Multi-Preference Alignment for Diffusion Models

Min Cheng, Fatemeh Doudi, Dileep Kalathil, Mohammad Ghavamzadeh, Panganamala R. Kumar

PDF

1 Repo 3 Reviews

TL;DR

Diffusion Blend introduces a novel inference-time method for aligning diffusion models with multiple user preferences by blending diffusion processes, enabling flexible, multi-objective image generation without additional fine-tuning.

Contribution

The paper proposes Diffusion Blend, a new approach that allows inference-time multi-preference alignment for diffusion models through process blending, eliminating the need for multiple fine-tuned models.

Findings

01

Outperforms relevant baselines in experiments.

02

Matches or exceeds the performance of individually fine-tuned models.

03

Enables efficient, user-driven multi-objective alignment at inference time.

Abstract

Reinforcement learning (RL) algorithms have been used recently to align diffusion models with downstream objectives such as aesthetic quality and text-image consistency by fine-tuning them to maximize a single reward function under a fixed KL regularization. However, this approach is inherently restrictive in practice, where alignment must balance multiple, often conflicting objectives. Moreover, user preferences vary across prompts, individuals, and deployment contexts, with varying tolerances for deviation from a pre-trained base model. We address the problem of inference-time multi-preference alignment: given a set of basis reward functions and a reference KL regularization strength, can we design a fine-tuning procedure so that, at inference time, it can generate images aligned with any user-specified linear combination of rewards and regularization, without requiring additional…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 4Confidence 4

Strengths

1. The problem is novel and well-motivated. The paper formalizes the inference-time multi-preference alignment problem from the perspective of MORL, which is practically important and underexplored in diffusion literature. 2. The approach is theoretically justified with clear derivations and bounds on approximation errors. 3. Strong empirical results on multiple datasets and reward functions.

Weaknesses

1. The Jensen-gap approximations in Eq. 8 are only empirically validated via downstream metrics; direct error analysis on $ \Delta(r,\alpha) $ (beyond Appendix bounds) would strengthen claims. 2. The paper builds on KL-regularized RL fine-tuning for diffusion models, but does not sufficiently discuss or compare to DiffusionDPO (Wallace et al., 2024), DDPO (Black et al., 2024), etc, which are dominant lines of work in this space. For example, the author cites DPO (Rafailovetal., 2023) for diffu

Reviewer 02Rating 6Confidence 4

Strengths

1. The paper is well written and easy to follow. 2. The studies of conducting inference time alignment on diffusion models are novel and interesting. 3. The authors conduct extensive experiments to verify the effectiveness of thier method.

Weaknesses

1. It's better for authors to have some results on larger models like SDXL to further prove the effectiveness of thier method. 2. It's better to demonstrate that the model can be used in wide applications like image editing. 3. The authors use DPOK for fine-tuning models, is this method sensitive to different RL algorithms?

Reviewer 03Rating 8Confidence 4

Strengths

- Importance of the problem: Dynamically aligning diffusion models' generation process with user preferences is a challenging problem that can have a great impact on content creation applications by providing users with knobs to adjust their content. - This paper proposes a simple algorithm that achieves this purpose by combining scores of diffusion models trained for specific rewards. This is theoretically justified well, and the simplicity of the algorithm would enable easy adoption of the me

Weaknesses

- The experiments are done with a relatively small number of reward models (most with two) and a single class of diffusion model, Stable Diffusion. The results could have been stronger by pushing the limit with more rewards (e.g. 6~10) and with more recent diffusion models such as Flux. - In practice, LoRA-based diffusion model weight combination techniques (e.g. Zou, Shen, Bouganis, and Zhao, ICLR 2025) could be a strong candidate to achieve the same purpose. How does it perform, and is there

Code & Models

Repositories

bluewoods127/db-2025
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsDiffusion · Balanced Selection · ALIGN · Sparse Evolutionary Training