Variance Reduction for Expectations with Diffusion Teachers

Jesse Bettencourt; Xindi Wu; Matan Atzmon; James Lucas; Jonathan Lorraine

arXiv:2605.21489·cs.LG·May 21, 2026

Variance Reduction for Expectations with Diffusion Teachers

Jesse Bettencourt, Xindi Wu, Matan Atzmon, James Lucas, Jonathan Lorraine

PDF

TL;DR

The paper introduces CARV, a variance reduction framework for Monte Carlo expectations in diffusion model pipelines, significantly improving compute efficiency and reducing estimator variance.

Contribution

CARV employs hierarchical Monte Carlo estimators with importance sampling and stratification to reduce variance and improve compute efficiency in diffusion model applications.

Findings

01

CARV achieves 2-3x effective compute improvements in text-to-3D distillation and attribution.

02

Variance reduction techniques cut gradient variance by an order of magnitude in single-step distillation.

03

Most of the compute gains come from amortized reuse of upstream computations.

Abstract

Pretrained diffusion models serve as frozen teachers feeding downstream pipelines such as text-to-3D, single-step distillation, and data attribution. The teacher gradients these pipelines consume are Monte Carlo (MC) expectations over noise levels and Gaussian noise samples; their estimator variance dominates compute cost because each draw requires expensive upstream work (rendering, simulation, encoding). We introduce CARV, a compute-aware variance-accounting framework that motivates a hierarchical MC estimator: amortize the expensive upstream computation over cheap diffusion-noise resamples, sharpened by timestep importance sampling and a stratified-inverse-CDF construction. In our text-to-3D distillation and attribution experiments, CARV delivers 2-3x effective compute multipliers (most from amortized reuse; ~25% additional from IS+stratification) without changing the objective; in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.