Gradient-Free Noise Optimization for Reward Alignment in Generative Models
Jeongsol Kim, Hongeun Kim, Jian Wang, Jong Chul Ye

TL;DR
ZeNO introduces a gradient-free noise optimization framework for reward alignment in generative models, enabling effective inference-time scaling without requiring backpropagation, demonstrated on diverse tasks including protein structure generation.
Contribution
The paper presents ZeNO, a novel zeroth-order noise optimization method formulated as a path-integral control problem, applicable to deterministic generators and non-differentiable rewards.
Findings
ZeNO performs well across various generators and reward functions.
It enables inference-time scaling without backpropagation.
Demonstrated effectiveness on protein structure generation.
Abstract
Existing reward alignment methods for diffusion and flow models rely on multi-step stochastic trajectories, making them difficult to extend to deterministic generators. A natural alternative is noise-space optimization, but existing approaches require backpropagation through the generator and reward pipeline, limiting applicability to differentiable settings. To address this, here we present ZeNO (Zeroth-order Noise Optimization), a gradient-free framework that formulates noise optimization as a path-integral control problem, estimable from zeroth-order reward evaluations alone. When instantiated with an Ornstein--Uhlenbeck reference process, the update connects to Langevin dynamics implicitly targeting a reward-tilted distribution. ZeNO enables effective inference-time scaling and demonstrates strong performance across diverse generators and reward functions, including a protein…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
