A survey of probabilistic generative frameworks for molecular simulations
Richard John, Lukas Herron, Pratyush Tiwary

TL;DR
This survey compares flow-based and diffusion probabilistic models for molecular data, evaluating their accuracy and efficiency across various datasets to guide model selection in molecular simulations.
Contribution
It introduces a taxonomy of probabilistic generative models and benchmarks three representative models on molecular datasets, highlighting their strengths and limitations.
Findings
Neural Spline Flows excel at low-dimensional mode asymmetry.
Conditional Flow Matching performs best on high-dimensional, low-complexity data.
Denoising Diffusion Probabilistic Models are superior for low-dimensional, high-complexity data.
Abstract
Generative artificial intelligence is now a widely used tool in molecular science. Despite the popularity of probabilistic generative models, numerical experiments benchmarking their performance on molecular data are lacking. In this work, we introduce and explain several classes of generative models, broadly sorted into two categories: flow-based models and diffusion models. We select three representative models: Neural Spline Flows, Conditional Flow Matching, and Denoising Diffusion Probabilistic Models, and examine their accuracy, computational cost, and generation speed across datasets with tunable dimensionality, complexity, and modal asymmetry. Our findings are varied, with no one framework being the best for all purposes. In a nutshell, (i) Neural Spline Flows do best at capturing mode asymmetry present in low-dimensional data, (ii) Conditional Flow Matching outperforms other…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsProtein Structure and Dynamics · DNA and Biological Computing · Gene Regulatory Network Analysis
MethodsDiffusion · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
