Identity resolution of software metadata using Large Language Models
Eva Mart\'in del Pico, Josep Llu\'is Gelp\'i, Salvador Capella-Guti\'errez

TL;DR
This paper evaluates instruction-tuned large language models for software metadata identity resolution, aiming to improve the consolidation of heterogeneous research software metadata for FAIRness monitoring.
Contribution
It benchmarks multiple models against human annotations and introduces an agreement-based proxy for high-confidence automated resolution in software metadata.
Findings
High precision achieved with the agreement-based proxy
Models show limitations in ambiguous cases
Automating semantic judgment remains challenging
Abstract
Software is an essential component of research. However, little attention has been paid to it compared with that paid to research data. Recently, there has been an increase in efforts to acknowledge and highlight the importance of software in research activities. Structured metadata from platforms like bio.tools, Bioconductor, and Galaxy ToolShed offers valuable insights into research software in the Life Sciences. Although originally intended to support discovery and integration, this metadata can be repurposed for large-scale analysis of software practices. However, its quality and completeness vary across platforms, reflecting diverse documentation practices. To gain a comprehensive view of software development and sustainability, consolidating this metadata is necessary, but requires robust mechanisms to address its heterogeneity and scale. This article presents an evaluation of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Research Data Management Practices · Cell Image Analysis Techniques
MethodsSoftmax · Attention Is All You Need
