HiSciBench: A Hierarchical Multi-disciplinary Benchmark for Scientific Intelligence from Reading to Discovery
Yaping Zhang, Qixuan Zhang, Xingquan Zhang, Zhiyuan Chen, Wenwen Zhuang, Yupu Liang, Lu Xiang, Yang Zhao, Jiajun Zhang, Yu Zhou, and Chengqing Zong

TL;DR
HiSciBench is a comprehensive hierarchical benchmark for evaluating scientific intelligence in foundation models across multiple disciplines and stages of scientific reasoning, highlighting significant performance gaps and guiding future development.
Contribution
It introduces a multi-level, multi-disciplinary benchmark that assesses models on the full scientific workflow, from literacy to discovery, with integrated, dependency-aware evaluation.
Findings
Models perform well on basic literacy tasks (up to 69% accuracy).
Performance drops significantly on complex discovery tasks (down to 25%).
HiSciBench provides detailed diagnostics for scientific reasoning capabilities.
Abstract
The rapid advancement of large language models (LLMs) and multimodal foundation models has sparked growing interest in their potential for scientific research. However, scientific intelligence encompasses a broad spectrum of abilities ranging from understanding fundamental knowledge to conducting creative discovery, and existing benchmarks remain fragmented. Most focus on narrow tasks and fail to reflect the hierarchical and multi-disciplinary nature of real scientific inquiry. We introduce \textbf{HiSciBench}, a hierarchical benchmark designed to evaluate foundation models across five levels that mirror the complete scientific workflow: \textit{Scientific Literacy} (L1), \textit{Literature Parsing} (L2), \textit{Literature-based Question Answering} (L3), \textit{Literature Review Generation} (L4), and \textit{Scientific Discovery} (L5). HiSciBench contains 8,735 carefully curated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Topic Modeling · Artificial Intelligence in Healthcare and Education
