Advancing descriptor search in materials science: feature engineering and selection strategies
Benedikt Hoock, Santiago Rigamonti, Claudia Draxl

TL;DR
This paper develops advanced feature engineering and selection methods using compressed sensing to identify low-dimensional, interpretable descriptors for predicting properties of materials, demonstrated on ternary group-IV compounds.
Contribution
It introduces new schemes for feature engineering based on basic properties and scalar representations, along with cross-validation based selection for high generalizability.
Findings
Effective descriptors for lattice constants identified
Descriptors accurately predict energies of mixing
Methods improve interpretability and predictive power
Abstract
A main goal of data-driven materials research is to find optimal low-dimensional descriptors, allowing us to predict a physical property, and to interpret them in a human-understandable way. In this work, we advance methods to identify descriptors out of a large pool of candidate features by means of compressed sensing. To this extent, we develop schemes for engineering appropriate candidate features that are based on simple basic properties of building blocks that constitute the materials and that are able to represent a multi-component system by scalar numbers. Cross-validation based feature-selection methods are developed for identifying the most relevant features, thereby focusing on high generalizability. We apply our approaches to an \textit{ab initio} dataset of ternary group-IV compounds to obtain a set of descriptors for predicting lattice constants and energies of mixing. In…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
