Making Metadata More FAIR Using Large Language Models
Sowmya S. Sundaram, Mark A. Musen

TL;DR
This paper introduces FAIRMetaText, an NLP-based tool leveraging Large Language Models to analyze and compare metadata descriptions, improving the FAIRness of metadata by suggesting compliant terms and grouping similar ones, thus reducing human effort.
Contribution
It presents a novel NLP application that uses LLMs to measure similarity between metadata descriptions, enhancing metadata quality and consistency in scientific data management.
Findings
Large language models significantly improve metadata comparison accuracy.
FAIRMetaText reduces manual effort in metadata curation.
Quantitative and qualitative evaluations show large gains in metadata tasks.
Abstract
With the global increase in experimental data artifacts, harnessing them in a unified fashion leads to a major stumbling block - bad metadata. To bridge this gap, this work presents a Natural Language Processing (NLP) informed application, called FAIRMetaText, that compares metadata. Specifically, FAIRMetaText analyzes the natural language descriptions of metadata and provides a mathematical similarity measure between two terms. This measure can then be utilized for analyzing varied metadata, by suggesting terms for compliance or grouping similar terms for identification of replaceable terms. The efficacy of the algorithm is presented qualitatively and quantitatively on publicly available research artifacts and demonstrates large gains across metadata related tasks through an in-depth study of a wide variety of Large Language Models (LLMs). This software can drastically reduce the human…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
