Stitched Value Model for Diffusion Alignment

Hyojun Go; Hyungjin Chung; Prune Truong; Goutam Bhat; Li Mi; Zhaochong An; Zixiang Zhao; Dominik Narnhofer; Serge Belongie; Federico Tombari; Konrad Schindler

arXiv:2605.19804·cs.CV·May 20, 2026

Stitched Value Model for Diffusion Alignment

Hyojun Go, Hyungjin Chung, Prune Truong, Goutam Bhat, Li Mi, Zhaochong An, Zixiang Zhao, Dominik Narnhofer, Serge Belongie, Federico Tombari, Konrad Schindler

PDF

TL;DR

StitchVM is a lightweight framework that efficiently transfers pretrained pixel-space reward models to noisy latent space in diffusion models, improving alignment and computational efficiency.

Contribution

It introduces a novel stitching method that combines existing pixel-space reward models with diffusion backbones, enabling fast and effective diffusion alignment.

Findings

01

DPS becomes 3.2 times faster and uses half the GPU memory.

02

DiffusionNFT becomes 2.3 times faster.

03

The stitching process takes only 10 GPU-hours for models like CLIP ViT-L and SD 3.5 Medium.

Abstract

For practical use, diffusion- or flow-based generative models must be aligned with task-specific rewards, such as prompt fidelity or aesthetic preference. That alignment is challenging because the reward is defined for clean output images, but the alignment procedure requires value function estimates at noisy intermediate latents. Existing methods resort to Tweedie-style or Monte Carlo approximations, trading off estimator bias against computational cost: Tweedie estimates are efficient but biased, while Monte Carlo estimates are more accurate but require expensive rollouts. A natural alternative would be a learned value function, but it remains an open question how to effectively train a strong and general value model specifically for noisy latents. Here, we propose StitchVM, a model stitching framework that efficiently transfers reward models pretrained for clean images to the noisy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.