Efficient dataset generation for machine learning perovskite alloys
Henrietta Homm, Jarno Laakso, Patrick Rinke

TL;DR
This paper presents a data-efficient machine learning scheme that reduces computational costs in modeling complex perovskite alloys, enabling faster and more accurate property predictions for materials with potential in solar cell applications.
Contribution
The authors introduce a novel clustering and active learning approach that significantly decreases the number of DFT calculations needed for training ML models on perovskite alloys.
Findings
Reduces DFT calculations by up to 50% during model training.
Achieves an average relaxation error of only 0.5 meV/atom for CsSn(Cl/Br/I)$_3$.
Demonstrates robustness and high accuracy of the ML model in structure relaxations.
Abstract
Lead-based perovskite solar cells have reached high efficiencies, but toxicity and lack of stability hinder their wide-scale adoption. These issues have been partially addressed through compositional engineering of perovskite materials, but the vast complexity of the perovskite materials space poses a significant obstacle to exploration. We previously demonstrated how machine learning (ML) can accelerate property predictions for the CsPb(Cl/Br) perovskite alloy. However, the substantial computational demand of density functional theory (DFT) calculations required for model training prevents applications to more complex materials. Here, we introduce a data-efficient scheme to facilitate model training, validated initially on CsPb(Cl/Br) data and extended to the ternary alloy CsSn(Cl/Br/I). Our approach employs clustering to construct a compact yet diverse initial dataset of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
