Inference-Time Compute Scaling For Flow Matching
Adam Stecklov, Noah El Rimawi-Fine, Mathieu Blanchette

TL;DR
This paper introduces new inference-time scaling methods for Flow Matching that improve sample quality across image and scientific domains without sacrificing sampling efficiency.
Contribution
It presents novel inference-time scaling procedures for Flow Matching that maintain linear interpolants, enabling quality improvements in diverse scientific and visual tasks.
Findings
Sample quality improves with increased inference compute.
Flow matching scaling applies successfully to scientific domains.
Method preserves efficient sampling while enhancing results.
Abstract
Allocating extra computation at inference time has recently improved sample quality in large language models and diffusion-based image generation. In parallel, Flow Matching (FM) has gained traction in language, vision, and scientific domains, but inference-time scaling methods for it remain under-explored. Concurrently, Kim et al., 2025 approach this problem but replace the linear interpolant with a non-linear variance-preserving (VP) interpolant at inference, sacrificing FM's efficient and straight sampling. Additionally, inference-time compute scaling for flow matching has only been applied to visual tasks, like image generation. We introduce novel inference-time scaling procedures for FM that preserve the linear interpolant during sampling. Evaluations of our method on image generation, and for the first time (to the best of our knowledge), unconditional protein generation, show…
Peer Reviews
Decision·Submitted to ICLR 2026
The paper tackles the problem of inference-time optimization for generative models, which is timely and relevant. Improving sample quality or alignment without additional training is valuable for deploying diffusion models under strict computational budgets. Building on sequential Monte Carlo (SMC) style sampling for diffusion models, the use of multiple particles can in principle improve diversity and success rate. The idea of a "noise search" to initialize particles is intuitively plausible.
Unfortunately, the paper in its current form has significant weaknesses that undermine its contributions. The overall novelty and empirical support are not convincing, and several important baselines or design choices appear to have been overlooked. The core idea of improving diversity by projecting or orthogonalizing initial Gaussian noise samples is not well justified. In high-dimensional latent spaces, random Gaussian vectors are already nearly orthogonal to each other due to concentration o
- I believe this paper have a well-motivated setting. Inference time compute scaling for diffusion models via noise search and verifier guided selection has been established, with algorithms and framing similar to the Best of N and path search used here, but in the diffusion setting. - The randomized ODE that injects score orthogonal perturbations while staying on the linear FM path appears new relative to EDM-style SDE noise injection and to particle guidance, which previously targeted diffusi
- I have a minor concern about the novelty of the paper. The particle guidance repulsion [1] and budget forcing [2] is exactly from respectively previous work (the authors did mention it). - The experimental results show monotone improvements with larger search. However, the compute accounting is not aligned across methods, which weakens empirical support. - Ablations on the noise injection are limited. The EDM and SDE ablations are useful but do not disentangle the roles of score orthogonali
- The paper addresses the inference-time compute scaling for flow-matching models without converting them to diffusion-like samplers. - The proposed Noise Search and RS+NS algorithms are conceptually straightforward yet effective.
- While the paper’s techniques are effective, many of the core ideas build upon existing strategies rather than inventing new paradigms. The paper's main distinction is well-motivated but represents an incremental methodological refinement. - The justification for the DMFM-ODE variant is relatively weak. The approach relies on empirically tuned heuristics, and the claim that linear interpolant scaling outperforms VP-trajectory in flow matching lacks sufficient theoretical or experimental support
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Cell Image Analysis Techniques
