ET-AL: Entropy-Targeted Active Learning for Bias Mitigation in Materials Data
Hengrui Zhang, Wei Wayne Chen, James M. Rondinelli, Wei Chen

TL;DR
This paper introduces an entropy-based active learning framework to identify and mitigate bias in materials datasets, enhancing the diversity of underrepresented crystal systems and improving machine learning model performance.
Contribution
It proposes a novel entropy-targeted active learning method specifically designed to reduce bias in materials data collections, which is a new approach in this domain.
Findings
ET-AL effectively reduces bias in materials datasets.
Improved diversity leads to better machine learning model accuracy.
Applicable to autonomous data acquisition and dataset optimization.
Abstract
Growing materials data and data-driven informatics drastically promote the discovery and design of materials. While there are significant advancements in data-driven models, the quality of data resources is less studied despite its huge impact on model performance. In this work, we focus on data bias arising from uneven coverage of materials families in existing knowledge. Observing different diversities among crystal systems in common materials databases, we propose an information entropy-based metric for measuring this bias. To mitigate the bias, we develop an entropy-targeted active learning (ET-AL) framework, which guides the acquisition of new data to improve the diversity of underrepresented crystal systems. We demonstrate the capability of ET-AL for bias mitigation and the resulting improvement in downstream machine learning models. This approach is broadly applicable to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Computational Drug Discovery Methods · Machine Learning and Algorithms
