Learnable Chernoff Baselines for Inference-Time Alignment
Sunil Madhow, Yuchen Liang, Ness Shroff, Yingbin Liang, Yu-Xiang Wang

TL;DR
This paper introduces Learnable Chernoff Baselines (LCBs), a novel method for efficient inference-time reward-guided alignment of generative models that achieves high accuracy with fewer model queries.
Contribution
The paper proposes LCBs, a new approach that enables approximate sampling from KL-regularized kernels using only black-box access, improving inference efficiency.
Findings
LCBs closely match ideal rejection sampling in experiments.
LCBs require substantially fewer queries to pretrained models.
The method provides total-variation guarantees to the aligned model.
Abstract
We study inference-time reward-guided alignment for generative models. Existing methods often rely on either architecture-specific adaptations or computationally costly inference procedures. We introduce Learnable Chernoff Baselines (LCBs) as a method for efficiently and approximately sampling from the exponentially tilted kernels that arise from KL-regularized reward alignment. Using only black-box sampling access to the pretrained model, LCBs implement a form of rejection sampling with adaptively selected acceptance probabilities, which allows fine-grained control over inference-compute scaling. We establish total-variation guarantees to the ideal aligned model, and demonstrate in both continuous and discrete diffusion settings that LCB sampling closely matches ideal rejection sampling while using substantially fewer queries to the pretrained model.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Model Reduction and Neural Networks · Machine Learning in Materials Science
