Representation of compounds for machine-learning prediction of physical properties
Atsuto Seko, Hiroyuki Hayashi, Keita Nakayama, Akira Takahashi and, Isao Tanaka

TL;DR
This paper introduces a systematic method for generating compound descriptors for machine learning models predicting physical properties, demonstrating high accuracy and efficiency across multiple datasets and properties.
Contribution
It presents a new procedure for creating descriptors from elemental and structural data, improving prediction accuracy and optimization efficiency in materials property modeling.
Findings
Kernel ridge model achieved 0.041 eV/atom error on cohesive energy dataset.
Descriptors showed good predictive performance for thermal conductivity and melting temperature.
Method enhanced Bayesian optimization efficiency for material property prediction.
Abstract
The representations of a compound, called "descriptors" or "features", play an essential role in constructing a machine-learning model of its physical properties. In this study, we adopt a procedure for generating a systematic set of descriptors from simple elemental and structural representations. First it is applied to a large dataset composed of the cohesive energy for about 18000 compounds computed by density functional theory (DFT) calculation. As a result, we obtain a kernel ridge prediction model with a prediction error of 0.041 eV/atom, which is close to the "chemical accuracy" of 1 kcal/mol (0.043 eV/atom). The procedure is also applied to two smaller datasets, i.e., a dataset of the lattice thermal conductivity (LTC) for 110 compounds computed by DFT calculation and a dataset of the experimental melting temperature for 248 compounds. We examine the performance of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
