Spatial statistics for screening molecular structures
Pranoy Ray, Surya R. Kalidindi

TL;DR
This paper introduces a physics-informed spatial statistics approach for molecular structure screening, enabling accurate, data-efficient predictions with simple models, suitable for diverse materials and low-data scenarios.
Contribution
It proposes a feature engineering method using spatial correlations and Fourier transforms, providing a low-dimensional, convex representation that outperforms complex deep architectures in data-scarce regimes.
Findings
Achieves sub-2% prediction error with as few as 10 training samples.
Supports Bayesian active learning and zero-shot extrapolation.
Enables lightweight models (<100k parameters) on standard hardware.
Abstract
The dominant paradigm in computational materials discovery relies on heavily parameterized deep architectures, including message-passing graph networks and equivariant models, that require millions of DFT-labeled training structures and produce non-convex latent representations that complicate continuous optimization for inverse design. These architectures are impractical in data-scarce regimes, which is the typical case in molecular screening, and exhibit well-documented limitations in capturing chemically disordered configurations and chiral geometries. This review presents feature engineering based on spatial statistics as a physically rigorous and immediately deployable alternative. Molecular structures are encoded as voxelized scalar fields, and two-point auto- and cross-correlations are evaluated deterministically via Fast Fourier Transforms, explicitly transferring the burden of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
