LitXBench: A Benchmark for Extracting Experiments from Scientific Literature

Curtis Chong; Jorge Colindres

arXiv:2604.07649·cs.IR·May 19, 2026

LitXBench: A Benchmark for Extracting Experiments from Scientific Literature

Curtis Chong, Jorge Colindres

PDF

TL;DR

LitXBench is a new benchmark framework designed to evaluate methods that extract experimental data from scientific literature, aiding materials discovery and property prediction.

Contribution

The paper introduces LitXBench, a benchmark for extracting experiments from literature, and presents LitXAlloy, a comprehensive dataset with improved data validation.

Findings

01

Frontier language models outperform existing extraction pipelines by up to 0.37 F1.

02

Extraction pipelines tend to associate measurements with compositions rather than processing steps.

03

Storing data as Python objects enhances auditability and validation.

Abstract

Aggregating experimental data from papers enables materials scientists to build better property prediction models and to facilitate scientific discovery. Recently, interest has grown in extracting not only single material properties but also entire experimental measurements. To support this shift, we introduce LitXBench, a framework for benchmarking methods that extract experiments from literature. We also present LitXAlloy, a dense benchmark comprising 1426 total measurements from 19 alloy papers. By storing the benchmark's entries as Python objects, rather than text-based formats such as CSV or JSON, we improve auditability and enable programmatic data validation. We find that frontier language models, such as Gemini 3.1 Pro Preview, outperform existing multi-turn extraction pipelines by up to 0.37 F1. Our results suggest that this performance gap arises because extraction pipelines…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.