Making Metadata More FAIR Using Large Language Models

Sowmya S. Sundaram; Mark A. Musen

arXiv:2307.13085·cs.CL·May 2, 2024

Making Metadata More FAIR Using Large Language Models

Sowmya S. Sundaram, Mark A. Musen

PDF

TL;DR

This paper introduces FAIRMetaText, an NLP-based tool leveraging Large Language Models to analyze and compare metadata descriptions, improving the FAIRness of metadata by suggesting compliant terms and grouping similar ones, thus reducing human effort.

Contribution

It presents a novel NLP application that uses LLMs to measure similarity between metadata descriptions, enhancing metadata quality and consistency in scientific data management.

Findings

01

Large language models significantly improve metadata comparison accuracy.

02

FAIRMetaText reduces manual effort in metadata curation.

03

Quantitative and qualitative evaluations show large gains in metadata tasks.

Abstract

With the global increase in experimental data artifacts, harnessing them in a unified fashion leads to a major stumbling block - bad metadata. To bridge this gap, this work presents a Natural Language Processing (NLP) informed application, called FAIRMetaText, that compares metadata. Specifically, FAIRMetaText analyzes the natural language descriptions of metadata and provides a mathematical similarity measure between two terms. This measure can then be utilized for analyzing varied metadata, by suggesting terms for compliance or grouping similar terms for identification of replaceable terms. The efficacy of the algorithm is presented qualitatively and quantitatively on publicly available research artifacts and demonstrates large gains across metadata related tasks through an in-depth study of a wide variety of Large Language Models (LLMs). This software can drastically reduce the human…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.