SciEGQA: A Dataset for Scientific Evidence-Grounded Question Answering and Reasoning

Wenhan Yu; Zhaoxi Zhang; Wang Chen; Guanqiang Qi; Weikang Li; Lei Sha; Deguo Xia; and Jizhou Huang

arXiv:2511.15090·cs.DB·March 31, 2026

SciEGQA: A Dataset for Scientific Evidence-Grounded Question Answering and Reasoning

Wenhan Yu, Zhaoxi Zhang, Wang Chen, Guanqiang Qi, Weikang Li, Lei Sha, Deguo Xia, and Jizhou Huang

PDF

1 Repo

TL;DR

SciEGQA is a new dataset for scientific question answering that emphasizes evidence grounding with annotated document regions, improving models' reasoning in scientific texts.

Contribution

Introduces SciEGQA, a dataset with human-annotated evidence regions and a large-scale auto-constructed training set for scientific QA and reasoning.

Findings

01

Models struggle with evidence localization and reasoning in scientific documents.

02

Training on SciEGQA improves models' scientific reasoning capabilities.

Abstract

Scientific documents contain complex multimodal structures, which makes evidence localization and scientific reasoning in Document Visual Question Answering particularly challenging. However, most existing benchmarks evaluate models only at the page level without explicitly annotating the evidence regions that support the answer, which limits both interpretability and the reliability of evaluation. To address this limitation, we introduce SciEGQA, a scientific document question answering and reasoning dataset with semantic evidence grounding, where supporting evidence is represented as semantically coherent document regions annotated with bounding boxes. SciEGQA consists of two components: a **human-annotated fine-grained benchmark** containing 1,623 high-quality question--answer pairs, and a **large-scale automatically constructed training set** with over 30K QA pairs generated through…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://yuwenhan07.github.io/SciEGQA-project
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.