Human-in-the-Loop and AI: Crowdsourcing Metadata Vocabulary for Materials Science
Jane Greenberg, Scott McClellan, Addy Ireland, Robert Sammarco, Colton Gerber, Christopher B. Rauch, Mat Kelly, John Kunze, Yuan An, Eric Toberer

TL;DR
This paper presents MatSci-YAMZ, a platform combining AI and human-in-the-loop approaches to efficiently develop metadata vocabularies in materials science, demonstrating a successful proof of concept and scalability potential.
Contribution
Introduction of MatSci-YAMZ, a novel AI-HILT platform that accelerates metadata vocabulary creation through crowdsourcing and iterative AI refinement in materials science.
Findings
Successful proof of concept demonstrated
Alignment with FAIR and open-science principles confirmed
Potential for scalability across different scientific domains
Abstract
Metadata vocabularies are essential for advancing FAIR and FARR data principles, but their development constrained by limited human resources and inconsistent standardization practices. This paper introduces MatSci-YAMZ, a platform that integrates artificial intelligence (AI) and human-in-the-loop (HILT), including crowdsourcing, to support metadata vocabulary development. The paper reports on a proof-of-concept use case evaluating the AI-HILT model in materials science, a highly interdisciplinary domain Six (6) participants affiliated with the NSF Institute for Data-Driven Dynamical Design (ID4) engaged with the MatSci-YAMZ plaform over several weeks, contributing term definitions and providing examples to prompt the AI-definitions refinement. Nineteen (19) AI-generated definitions were successfully created, with iterative feedback loops demonstrating the feasibility of AI-HILT…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Research Data Management Practices · Scientific Computing and Data Management
