Beyond Atomic Geometry Representations in Materials Science: A Human-in-the-Loop Multimodal Framework
Can Polat, Erchin Serpedin, Mustafa Kurban, Hasan Kurban

TL;DR
This paper introduces MCS-Set, a multimodal materials dataset combining atomic geometries, 2D projections, and textual annotations, enabling advanced machine learning tasks and benchmarking in materials science.
Contribution
It presents a human-in-the-loop framework for creating a multimodal materials dataset that enhances data richness and supports new predictive and generative models.
Findings
Significant modality-specific performance gaps identified
High-quality annotations improve model generalization
MCS-Set enables benchmarking of multimodal models in materials science
Abstract
Most materials science datasets are limited to atomic geometries (e.g., XYZ files), restricting their utility for multimodal learning and comprehensive data-centric analysis. These constraints have historically impeded the adoption of advanced machine learning techniques in the field. This work introduces MultiCrystalSpectrumSet (MCS-Set), a curated framework that expands materials datasets by integrating atomic structures with 2D projections and structured textual annotations, including lattice parameters and coordination metrics. MCS-Set enables two key tasks: (1) multimodal property and summary prediction, and (2) constrained crystal generation with partial cluster supervision. Leveraging a human-in-the-loop pipeline, MCS-Set combines domain expertise with standardized descriptors for high-quality annotation. Evaluations using state-of-the-art language and vision-language models…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · History and advancements in chemistry
