STEMVerse: A Dual-Axis Diagnostic Framework for STEM Reasoning in Large Language Models
Xuzhao Li, Xuchen Li, Jian Zhao, Shiyu Hu

TL;DR
STEMVerse is a diagnostic framework that systematically evaluates large language models' STEM reasoning by analyzing their performance across academic disciplines and cognitive complexities, revealing structural failure patterns.
Contribution
It introduces a dual-axis diagnostic approach that maps LLM performance in STEM across discipline and cognition, offering detailed insights beyond aggregate scores.
Findings
Re-aggregated 20,000+ STEM problems into a unified capability space
Identified structural failure patterns in LLM STEM reasoning
Provided actionable insights for model improvement
Abstract
As Large Language Models (LLMs) achieve significant breakthroughs in complex reasoning tasks, evaluating their proficiency in science, technology, engineering, and mathematics (STEM) has become a primary method for measuring machine intelligence. However, current evaluation paradigms often treat benchmarks as isolated "silos," offering only monolithic aggregate scores that neglect the intricacies of both academic specialization and cognitive depth. This result-oriented approach fails to distinguish whether model errors stem from insufficient domain knowledge or deficiencies in cognitive capacity, thereby limiting the diagnostic value. To address this, we propose STEMVerse, a diagnostic framework designed to systematically analyze the STEM reasoning capabilities of LLMs. This framework characterizes model performance across academic specialization and cognitive complexity to map the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Machine Learning in Materials Science · Artificial Intelligence in Healthcare and Education
