Impact of Data Bias on Machine Learning for Crystal Compound Synthesizability Predictions
Ali Davariashtiyani, Busheng Wang, Samad Hajinazar, Eva Zurek, and, Sara Kadkhodaei

TL;DR
This paper investigates how data bias affects machine learning models predicting the synthesizability of crystal compounds, highlighting the importance of data quality and bias detection for reliable predictions.
Contribution
It demonstrates the impact of data bias on model performance and introduces procedures to detect and evaluate bias effects in crystal synthesizability prediction models.
Findings
Data bias significantly alters model predictions.
Unbalanced data propagates bias, reducing real-world applicability.
Bias detection procedures improve model reliability.
Abstract
Machine learning models are susceptible to being misled by biases in training data that emphasize incidental correlations over the intended learning task. In this study, we demonstrate the impact of data bias on the performance of a machine learning model designed to predict the synthesizability likelihood of crystal compounds. The model performs a binary classification on labeled crystal samples. Despite using the same architecture for the machine learning model, we showcase how the model's learning and prediction behavior differs once trained on distinct data. We use two data sets for illustration: a mixed-source data set that integrates experimental and computational crystal samples and a single-source data set consisting of data exclusively from one computational database. We present simple procedures to detect data bias and to evaluate its effect on the model's performance and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · X-ray Diffraction in Crystallography · Crystallization and Solubility Studies
