Global property prediction: A benchmark study on open source, perovskite-like datasets
Felix Mayr, Alessio Gagliardi

TL;DR
This study benchmarks machine learning models on open-source perovskite-like datasets for property prediction, revealing limitations in model generalization and the impact of dataset dependency on evaluation metrics.
Contribution
It provides a comprehensive comparison of fingerprint-based ML models across multiple datasets, highlighting challenges in generalization and the effects of fingerprint reduction techniques.
Findings
Models do not perform evenly across different datasets.
Evaluation metrics are highly dependent on the specific database.
Fingerprint reduction via autoencoders captures only a submanifold of data.
Abstract
Screening combinatorial space for novel materials - such as perovskite-like ones for photovoltaics - has resulted in a high amount of simulated high-troughput data and analysis thereof. This study proposes a comprehensive comparison of structural-fingerprint based machine-learning models on seven open-source databases of perovskite-like materials to predict bandgaps and energies. It shows that none of the given methods are able to capture arbitrary databases evenly, while underlining that commonly used metrics are highly database dependent in typical workflows. In addition the applicability of variance selection and autoencoders to significantly reduce fingerprint size indicates that models built with common fingerprints only rely on a submanifold of the available fingerprint space.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
