Big Data of Materials Science - Critical Role of the Descriptor
Luca M. Ghiringhelli, Jan Vybiral, Sergey V. Levchenko, Claudia Draxl,, and Matthias Scheffler

TL;DR
This paper emphasizes the importance of selecting meaningful descriptors in materials science for reliable statistical learning and demonstrates a systematic approach to identify such descriptors using semiconductor energy differences.
Contribution
It introduces a systematic method for finding meaningful descriptors in materials science, addressing the challenge of causality in descriptor-property relationships.
Findings
A meaningful descriptor can be systematically identified for semiconductor energy differences.
Proper descriptor selection enhances trustworthiness of predictions in materials science.
The approach improves understanding of the causal links between descriptors and properties.
Abstract
Statistical learning of materials properties or functions so far starts with a largely silent, non-challenged step: the choice of the set of descriptive parameters (termed descriptor). However, when the scientific connection between the descriptor and the actuating mechanisms is unclear, causality of the learned descriptor-property relation is uncertain. Thus, trustful prediction of new promising materials, identification of anomalies, and scientific advancement are doubtful. We analyse this issue and define requirements for a suited descriptor. For a classical example, the energy difference of zincblende/wurtzite and rocksalt semiconductors, we demonstrate how a meaningful descriptor can be found systematically.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Advanced Materials Characterization Techniques
