Variable Extraction for Model Recovery in Scientific Literature

Chunwei Liu; Enrique Noriega-Atala; Adarsh Pyarelal; Clayton T; Morrison; Mike Cafarella

arXiv:2411.14569·cs.IR·November 25, 2024

Variable Extraction for Model Recovery in Scientific Literature

Chunwei Liu, Enrique Noriega-Atala, Adarsh Pyarelal, Clayton T, Morrison, Mike Cafarella

PDF

Open Access 1 Video

TL;DR

This paper evaluates methods for extracting variables from scientific literature, introduces a benchmark dataset, and demonstrates that large language models outperform rule-based systems in this task, aiding automatic model recovery.

Contribution

It introduces a benchmark dataset for variable extraction and shows that LLMs significantly outperform rule-based methods in extracting variables from scientific papers.

Findings

01

LLMs outperform rule-based extraction methods.

02

Combining rule-based and LLM methods yields marginal improvements.

03

LLMs show strong potential for automatic scientific artifact comprehension.

Abstract

The global output of academic publications exceeds 5 million articles per year, making it difficult for humans to keep up with even a tiny fraction of scientific output. We need methods to navigate and interpret the artifacts -- texts, graphs, charts, code, models, and datasets -- that make up the literature. This paper evaluates various methods for extracting mathematical model variables from epidemiological studies, such as ``infection rate ( $α$ ),'' ``recovery rate ( $γ$ ),'' and ``mortality rate ( $μ$ ).'' Variable extraction appears to be a basic task, but plays a pivotal role in recovering models from scientific literature. Once extracted, we can use these variables for automatic mathematical modeling, simulation, and replication of published results. We introduce a benchmark dataset comprising manually-annotated variable descriptions and variable values extracted from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Variable Extraction for Model Recovery in Scientific Literature· underline

Taxonomy

TopicsScientific Computing and Data Management · Biomedical Text Mining and Ontologies · Image Processing and 3D Reconstruction