# SciDaSynth: Interactive Structured Data Extraction From Scientific Literature With Large Language Model

**Authors:** Xingbo Wang, Samantha L. Huey, Rui Sheng, Saurabh Mehta, Fei Wang

PMC · DOI: 10.1002/cl2.70073 · 2025-11-03

## TL;DR

SciDaSynth is a new system that uses large language models to extract and structure data from scientific papers more efficiently and accurately.

## Contribution

SciDaSynth introduces an interactive system for structured data extraction that integrates text, tables, and figures using large language models.

## Key findings

- SciDaSynth outperforms baseline methods in producing high-quality structured data.
- The system supports data validation and refinement through visual summaries and semantic grouping.
- A within-subjects study with researchers confirmed its effectiveness in data extraction tasks.

## Abstract

The explosion of scientific literature has made the efficient and accurate extraction of structured data a critical component for advancing scientific knowledge and supporting evidence‐based decision‐making. However, existing tools often struggle to extract and structure multimodal, varied, and inconsistent information across documents into standardized formats. We introduce SciDaSynth, a novel interactive system powered by large language models that automatically generates structured data tables according to users' queries by integrating information from diverse sources, including text, tables, and figures. Furthermore, SciDaSynth supports efficient table data validation and refinement, featuring multi‐faceted visual summaries and semantic grouping capabilities to resolve cross‐document data inconsistencies. A within‐subjects study with nutrition and NLP researchers demonstrates SciDaSynth's effectiveness in producing high‐quality structured data more efficiently than baseline methods. We discuss design implications for human–AI collaborative systems supporting data extraction tasks.

## Full-text entities

- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12581027/full.md

---
Source: https://tomesphere.com/paper/PMC12581027