SciCUEval: A Comprehensive Dataset for Evaluating Scientific Context Understanding in Large Language Models
Jing Yu, Yuqi Tang, Kehua Feng, Mingyang Rao, Lei Liang, Zhiqiang Zhang, Mengshu Sun, Wen Zhang, Qiang Zhang, Keyan Ding, Huajun Chen

TL;DR
SciCUEval is a new comprehensive benchmark dataset designed to evaluate large language models' understanding of scientific contexts across multiple domains and data modalities, addressing a gap in existing evaluation tools.
Contribution
It introduces SciCUEval, a multi-domain, multi-modal benchmark dataset for assessing scientific context understanding in large language models, with detailed evaluation of model capabilities.
Findings
LLMs show strengths in some scientific tasks but struggle with others.
The benchmark reveals specific limitations in multi-source information integration.
Insights guide future development of scientific-domain LLMs.
Abstract
Large Language Models (LLMs) have shown impressive capabilities in contextual understanding and reasoning. However, evaluating their performance across diverse scientific domains remains underexplored, as existing benchmarks primarily focus on general domains and fail to capture the intricate complexity of scientific data. To bridge this gap, we construct SciCUEval, a comprehensive benchmark dataset tailored to assess the scientific context understanding capability of LLMs. It comprises ten domain-specific sub-datasets spanning biology, chemistry, physics, biomedicine, and materials science, integrating diverse data modalities including structured tables, knowledge graphs, and unstructured texts. SciCUEval systematically evaluates four core competencies: Relevant information identification, Information-absence detection, Multi-source information integration, and Context-aware inference,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Biomedical Text Mining and Ontologies
MethodsFocus
