Iterative Corpus Refinement for Materials Property Prediction Based on Scientific Texts
Lei Zhang, Markus Stricker

TL;DR
This paper introduces an iterative method that refines scientific texts to improve materials property prediction, effectively identifying top candidates for electrocatalytic reactions despite limited data.
Contribution
The study presents a novel iterative corpus refinement framework that leverages scientific texts and Word2Vec models to enhance materials discovery in scarce data scenarios.
Findings
Successfully predicts high-performing materials for ORR, HER, and OER.
Validated predictions with experimental electrocatalytic measurements.
Demonstrates scalable approach for large compositional spaces.
Abstract
The discovery and optimization of materials for specific applications is hampered by the practically infinite number of possible elemental combinations and associated properties, also known as the `combinatorial explosion'. By nature of the problem, data are scarce and all possible data sources should be used. In addition to simulations and experimental results, the latent knowledge in scientific texts is not yet used to its full potential. We present an iterative framework that refines a given scientific corpus by strategic selection of the most diverse documents, training Word2Vec models, and monitoring the convergence of composition-property correlations in embedding space. Our approach is applied to predict high-performing materials for oxygen reduction (ORR), hydrogen evolution (HER), and oxygen evolution (OER) reactions for a large number of possible candidate compositions. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science
