Diagnosing Structural Failures in LLM-Based Evidence Extraction for Meta-Analysis

Zhiyin Tan; Jennifer D'Souza

arXiv:2602.10881·cs.CL·February 12, 2026

Diagnosing Structural Failures in LLM-Based Evidence Extraction for Meta-Analysis

Zhiyin Tan, Jennifer D'Souza

PDF

Open Access

TL;DR

This paper evaluates the ability of large language models to perform structured evidence extraction for meta-analyses, revealing significant limitations in relational and numerical accuracy that hinder reliable automation.

Contribution

It introduces a diagnostic framework and evaluation protocol to systematically assess LLMs' structural fidelity in evidence extraction for meta-analysis, highlighting key failure modes.

Findings

01

Performance drops sharply with complex relational tasks

02

Long-context inputs worsen extraction reliability

03

Systematic structural errors hinder accurate meta-analytic data extraction

Abstract

Systematic reviews and meta-analyses rely on converting narrative articles into structured, numerically grounded study records. Despite rapid advances in large language models (LLMs), it remains unclear whether they can meet the structural requirements of this process, which hinge on preserving roles, methods, and effect-size attribution across documents rather than on recognizing isolated entities. We propose a structural, diagnostic framework that evaluates LLM-based evidence extraction as a progression of schema-constrained queries with increasing relational and numerical complexity, enabling precise identification of failure points beyond atom-level extraction. Using a manually curated corpus spanning five scientific domains, together with a unified query suite and evaluation protocol, we evaluate two state-of-the-art LLMs under both per-document and long-context, multi-document…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Computational and Text Analysis Methods · Biomedical Text Mining and Ontologies