Stress-Testing Multimodal Foundation Models for Crystallographic Reasoning
Can Polat, Hasan Kurban, Erchin Serpedin, Mustafa Kurban

TL;DR
This paper introduces new benchmarks and evaluation protocols for stress-testing multimodal foundation models in crystallography, focusing on their ability to generalize, maintain physical consistency, and produce reliable structural predictions.
Contribution
It presents a multiscale multicrystal dataset and two physically grounded benchmarks to rigorously evaluate multimodal models' generalization and physical consistency in crystallography.
Findings
Models exhibit varying degrees of generalization across spatial and compositional exclusions.
The benchmarks reveal specific failure modes related to physical violations and hallucinations.
Evaluation metrics effectively quantify model accuracy, consistency, and reliability.
Abstract
Evaluating foundation models for crystallographic reasoning requires benchmarks that isolate generalization behavior while enforcing physical constraints. This work introduces a multiscale multicrystal dataset with two physically grounded evaluation protocols to stress-test multimodal generative models. The Spatial-Exclusion benchmark withholds all supercells of a given radius from a diverse dataset, enabling controlled assessments of spatial interpolation and extrapolation. The Compositional-Exclusion benchmark omits all samples of a specific chemical composition, probing generalization across stoichiometries. Nine vision--language foundation models are prompted with crystallographic images and textual context to generate structural annotations. Responses are evaluated via (i) relative errors in lattice parameters and density, (ii) a physics-consistency index penalizing volumetric…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science
