Iterative Corpus Refinement for Materials Property Prediction Based on Scientific Texts

Lei Zhang; Markus Stricker

arXiv:2505.21646·cs.CL·June 11, 2025

Iterative Corpus Refinement for Materials Property Prediction Based on Scientific Texts

Lei Zhang, Markus Stricker

PDF

Open Access

TL;DR

This paper introduces an iterative method that refines scientific texts to improve materials property prediction, effectively identifying top candidates for electrocatalytic reactions despite limited data.

Contribution

The study presents a novel iterative corpus refinement framework that leverages scientific texts and Word2Vec models to enhance materials discovery in scarce data scenarios.

Findings

01

Successfully predicts high-performing materials for ORR, HER, and OER.

02

Validated predictions with experimental electrocatalytic measurements.

03

Demonstrates scalable approach for large compositional spaces.

Abstract

The discovery and optimization of materials for specific applications is hampered by the practically infinite number of possible elemental combinations and associated properties, also known as the `combinatorial explosion'. By nature of the problem, data are scarce and all possible data sources should be used. In addition to simulations and experimental results, the latent knowledge in scientific texts is not yet used to its full potential. We present an iterative framework that refines a given scientific corpus by strategic selection of the most diverse documents, training Word2Vec models, and monitoring the convergence of composition-property correlations in embedding space. Our approach is applied to predict high-performing materials for oxygen reduction (ORR), hydrogen evolution (HER), and oxygen evolution (OER) reactions for a large number of possible candidate compositions. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Materials Science