Data-Efficient Machine learning for Predicting Dopant Formation Energies in TiO$_2$ Monolayer
Kati Asikainen, Matti Alatalo, Marko Huttula, Assa Aravindh Sasikala Devi

TL;DR
This study demonstrates that physically grounded, compact datasets enable accurate machine learning predictions of dopant formation energies in TiO₂ monolayers, even with limited training data.
Contribution
It introduces a descriptor-based machine learning approach that achieves accurate, transferable predictions of dopant energies using small, physically relevant datasets.
Findings
Accurate predictions with limited data are possible using physically relevant descriptors.
Model performance remains robust across different dopant types.
Adding more data improves accuracy and transferability.
Abstract
Machine learning models are increasingly applied in materials science, yet their predictive power is often constrained by data scarcity. Here, we show that accurate predictions can be achieved, even with a limited number of training examples, provided the dataset is compact and and grounded in physically relevant quantities. By combining density functional theory calculations with a machine-learning framework, we construct accurate descriptor-based models to predict the formation energies of doped lepidocrocite TiO monolayers. The predictive accuracy of machine-learning models was first evaluated for single-dopant Pt configurations, demonstrating that the selected structural and chemical descriptors reliably capture the key factors governing dopant stability. Chemical transferability is then examined by extending the dataset to include Ag-doped configurations. Predictive accuracy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Electrocatalysts for Energy Conversion · Computational Drug Discovery Methods
