Meta Flow Maps enable scalable reward alignment
Peter Potaptchik, Adhi Saravanan, Abbas Mammadov, Alvaro Prat, Michael S. Albergo, Yee Whye Teh

TL;DR
Meta Flow Maps (MFMs) provide a scalable, efficient method for reward alignment in generative models by enabling stochastic posterior sampling, reducing computational costs in steering and fine-tuning tasks.
Contribution
We introduce Meta Flow Maps, a novel framework extending flow models to stochastic regimes, allowing efficient posterior sampling and improved reward alignment in generative models.
Findings
Steered-MFM outperforms baseline on ImageNet across multiple rewards.
MFMs enable inference-time steering without inner rollouts.
MFMs facilitate unbiased, off-policy fine-tuning for general rewards.
Abstract
Controlling generative models is computationally expensive. This is because optimal alignment with a reward function--whether via inference-time steering or fine-tuning--requires estimating the value function. This task demands access to the conditional posterior , the distribution of clean data consistent with an intermediate state , a requirement that typically compels methods to resort to costly trajectory simulations. To address this bottleneck, we introduce Meta Flow Maps (MFMs), a framework extending consistency models and flow maps into the stochastic regime. MFMs are trained to perform stochastic one-step posterior sampling, generating arbitrarily many i.i.d. draws of clean data from any intermediate state. Crucially, these samples provide a differentiable reparametrization that unlocks efficient value function estimation. We leverage this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Machine Learning in Healthcare · Domain Adaptation and Few-Shot Learning
