Context Selection for Hypothesis and Statistical Evidence Extraction from Full-Text Scientific Articles
Sai Koneru, Jian Wu, Sarah Rajtmajer

TL;DR
This paper investigates methods for extracting hypotheses and supporting statistical evidence from full-text scientific articles, emphasizing the importance of effective context selection to improve retrieval and extraction accuracy using large language models.
Contribution
It introduces a two-stage retrieve-and-extract framework and systematically studies retrieval design choices, highlighting the benefits of targeted context selection for hypothesis extraction.
Findings
Targeted context selection improves hypothesis extraction performance.
Retrieval quality and context cleanliness significantly impact extraction success.
Statistical evidence extraction remains challenging even with optimal context.
Abstract
Extracting hypotheses and their supporting statistical evidence from full-text scientific articles is central to the synthesis of empirical findings, but remains difficult due to document length and the distribution of scientific arguments across sections of the paper. The work studies a sequential full-text extraction setting, where the statement of a primary finding in an article's abstract is linked to (i) a corresponding hypothesis statement in the paper body and (ii) the statistical evidence that supports or refutes that hypothesis. This formulation induces a challenging within-document retrieval setting in which many candidate paragraphs are topically related to the finding but differ in rhetorical role, creating hard negatives for retrieval and extraction. Using a two-stage retrieve-and-extract framework, we conduct a controlled study of retrieval design choices, varying context…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Topic Modeling · Advanced Text Analysis Techniques
