Hierarchical Variational Policies for Reward-Guided Diffusion

Kushagra Pandey; Farrin Marouf Sofian; Jan Niklas Groeneveld; Felix Draxler; Stephan Mandt

arXiv:2605.21661·cs.LG·May 22, 2026

Hierarchical Variational Policies for Reward-Guided Diffusion

Kushagra Pandey, Farrin Marouf Sofian, Jan Niklas Groeneveld, Felix Draxler, Stephan Mandt

PDF

TL;DR

This paper introduces a hierarchical variational policy framework for reward-guided diffusion models, enabling faster, high-quality sample generation for inverse problems with reduced inference costs.

Contribution

It presents a novel hierarchical variational approach that amortizes control into a lightweight policy, supporting few-step diffusion sampling for efficient, high-quality results.

Findings

01

Achieves better perceptual quality with over 5x faster inference on 4x super-resolution.

02

Matches or exceeds recent test-time scaling baselines in quality-speed tradeoff.

03

Extends to semi-amortized regimes for state-of-the-art inverse problem solutions.

Abstract

Adapting pretrained diffusion models to downstream objectives such as inverse problems often requires expensive test-time guidance or optimization. We propose a principled framework for generating high-quality reward-aligned samples at substantially reduced inference cost. Our approach formulates test-time adaptation as a hierarchical variational model, where control is amortized into a lightweight yet expressive stochastic policy. This formulation naturally supports few-step diffusion sampling: large step sizes enable fast inference, while the learned policy maintains sample quality by providing structured per-step control. The resulting fully amortized sampler achieves a strong quality--speed tradeoff, matching or exceeding recent test-time scaling baselines while requiring significantly less compute. For example, on 4x super-resolution, our method achieves better perceptual quality…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.