SciCUEval: A Comprehensive Dataset for Evaluating Scientific Context Understanding in Large Language Models

Jing Yu; Yuqi Tang; Kehua Feng; Mingyang Rao; Lei Liang; Zhiqiang Zhang; Mengshu Sun; Wen Zhang; Qiang Zhang; Keyan Ding; Huajun Chen

arXiv:2505.15094·cs.CL·May 22, 2025

SciCUEval: A Comprehensive Dataset for Evaluating Scientific Context Understanding in Large Language Models

Jing Yu, Yuqi Tang, Kehua Feng, Mingyang Rao, Lei Liang, Zhiqiang Zhang, Mengshu Sun, Wen Zhang, Qiang Zhang, Keyan Ding, Huajun Chen

PDF

Open Access

TL;DR

SciCUEval is a new comprehensive benchmark dataset designed to evaluate large language models' understanding of scientific contexts across multiple domains and data modalities, addressing a gap in existing evaluation tools.

Contribution

It introduces SciCUEval, a multi-domain, multi-modal benchmark dataset for assessing scientific context understanding in large language models, with detailed evaluation of model capabilities.

Findings

01

LLMs show strengths in some scientific tasks but struggle with others.

02

The benchmark reveals specific limitations in multi-source information integration.

03

Insights guide future development of scientific-domain LLMs.

Abstract

Large Language Models (LLMs) have shown impressive capabilities in contextual understanding and reasoning. However, evaluating their performance across diverse scientific domains remains underexplored, as existing benchmarks primarily focus on general domains and fail to capture the intricate complexity of scientific data. To bridge this gap, we construct SciCUEval, a comprehensive benchmark dataset tailored to assess the scientific context understanding capability of LLMs. It comprises ten domain-specific sub-datasets spanning biology, chemistry, physics, biomedicine, and materials science, integrating diverse data modalities including structured tables, knowledge graphs, and unstructured texts. SciCUEval systematically evaluates four core competencies: Relevant information identification, Information-absence detection, Multi-source information integration, and Context-aware inference,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Biomedical Text Mining and Ontologies

MethodsFocus