Why Physics Still Matters: Improving Machine Learning Prediction of Material Properties with Phonon-Informed Datasets
Pol Ben\'itez, Cibr\'an L\'opez, Edgardo Saucedo, Teruyasu Mizoguchi, Claudio Cazorla

TL;DR
This paper demonstrates that physically informed, phonon-based datasets significantly improve machine learning predictions of material properties over random datasets, emphasizing the importance of data quality and physical relevance.
Contribution
It introduces a physically guided data generation strategy for training graph neural networks, outperforming random sampling in predicting properties of materials at finite temperatures.
Findings
Phonon-informed datasets lead to better GNN performance with fewer data.
Physically guided data improves model explainability and relevance.
Larger datasets do not always enhance predictive accuracy.
Abstract
Machine learning (ML) methods have become powerful tools for predicting material properties with near first-principles accuracy and vastly reduced computational cost. However, the performance of ML models critically depends on the quality, size, and diversity of the training dataset. In materials science, this dependence is particularly important for learning from low-symmetry atomistic configurations that capture thermal excitations, structural defects, and chemical disorder, features that are ubiquitous in real materials but underrepresented in most datasets. The absence of systematic strategies for generating representative training data may therefore limit the predictive power of ML models in technologically critical fields such as energy conversion and photonics. In this work, we assess the effectiveness of graph neural network (GNN) models trained on two fundamentally different…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Advanced Graph Neural Networks · Thermal properties of materials
