Verifier-Constrained Flow Expansion for Discovery Beyond the Data
Riccardo De Santi, Kimon Protopapas, Ya-Ping Hsieh, Andreas Krause

TL;DR
This paper introduces Flow Expander, a method to adapt pre-trained flow models for scientific discovery by expanding their sampling domain beyond the training data, guided by verifiers to ensure sample validity.
Contribution
It proposes a verifier-constrained flow expansion framework with theoretical guarantees, enabling models to generate more diverse valid samples beyond the original data distribution.
Findings
Successfully expands molecular design space while maintaining validity.
Theoretically guarantees convergence of the flow expansion process.
Demonstrates increased diversity in generated samples in empirical tests.
Abstract
Flow and diffusion models are typically pre-trained on limited available data (e.g., molecular samples), covering only a fraction of the valid design space (e.g., the full molecular space). As a consequence, they tend to generate samples from only a narrow portion of the feasible domain. This is a fundamental limitation for scientific discovery applications, where one typically aims to sample valid designs beyond the available data distribution. To this end, we address the challenge of leveraging access to a verifier (e.g., an atomic bonds checker), to adapt a pre-trained flow model so that its induced density expands beyond regions of high data availability, while preserving samples validity. We introduce formal notions of strong and weak verifiers and propose algorithmic frameworks for global and local flow expansion via probability-space optimization. Then, we present Flow Expander…
Peer Reviews
Decision·ICLR 2026 Poster
**Strong motivation and relevance.** The paper tackles an important limitation of pretrained flow and diffusion models—namely, their inability to explore beyond the data manifold while maintaining sample validity. The idea of integrating verifier-based constraints into generative model fine-tuning is timely and relevant for scientific design applications (e.g., molecular or material generation). **Principled formulation.** The proposed *Flow Expander* framework is grounded in a clear optimizati
**Limited and unconvincing experiments.** The empirical evaluation is restricted to 2D toy examples and a small-scale QM9 conformational generation task. These setups are insufficient to demonstrate practical effectiveness or scalability of the proposed method. The results primarily serve as proof-of-concept demonstrations rather than evidence of real-world impact or generalization capability. **Unclear and unverifiable baselines.** The paper cites Uehara et al. (2024, Section 8.2) as the sourc
Major contributions: • Flow Expander (FE), a principled probability-space optimization scheme • A theoretical analysis of the proposed algorithm • An experimental evaluation of FE It is a well-written paper, with new ideas and interesting results. I think many researchers in our community will appreciate this paper.
• The paper is somewhat dense and not always easy to follow. • The numerical experiments are somewhat limited. I appreciate both the illustrative examples and the results on the molecular design task, but I wish the paper included more high-impact, real-world examples where verifiers exist.
* Addresses how to leverage pre-trained generative models to explore novel and valid regions of a design space, moving beyond the original data distribution. * Formalizes the problem into Global Flow Expansion (using a strong verifier) and Local Flow Expansion (using a weak verifier). * Lifts the optimization objective from the final time-step ($p_1$) to the entire noised state space ($Q^\pi$) to theoretically mitigate the score divergence problem that occurs as $t \to 1$.
* One weakness is the use of potentially uninformative baselines. The paper compares FE (a "search + constraint" method) against "search-only" (S-MEME/FDC) and "constraint-only" (CONSTR). This comparison is not fully informative, as FE is designed to outperform them. A fair and important baseline would be unconstrained exploration (FDC/S-MEME) followed by post-hoc rejection sampling using the verifier. Without this, the practical value of FE's complex optimization is unknown. * The method's reli
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Computational Drug Discovery Methods · Generative Adversarial Networks and Image Synthesis
