STEM-POM: Evaluating Language Models Math-Symbol Reasoning in Document Parsing

Jiaru Zou; Qing Wang; Pratyush Thakur; Nickvash Kani

arXiv:2411.00387·cs.CL·June 3, 2025

STEM-POM: Evaluating Language Models Math-Symbol Reasoning in Document Parsing

Jiaru Zou, Qing Wang, Pratyush Thakur, Nickvash Kani

PDF

Open Access

TL;DR

This paper introduces STEM-PoM, a benchmark dataset from real-world scientific documents to evaluate and improve large language models' understanding of math symbols in context, revealing significant performance gaps.

Contribution

STEM-PoM is the first comprehensive dataset for evaluating LLMs' reasoning with math symbols in scientific texts, aiding future model improvements.

Findings

01

State-of-the-art LLMs achieve 20-60% accuracy in symbol classification.

02

Fine-tuning improves accuracy to 50-60%.

03

Significant gap remains in LLMs' mathematical reasoning abilities.

Abstract

Advances in large language models (LLMs) have spurred research into enhancing their reasoning capabilities, particularly in math-rich STEM (Science, Technology, Engineering, and Mathematics) documents. While LLMs can generate equations or solve math-related queries, their ability to fully understand and interpret abstract mathematical symbols in long, math-rich documents remains limited. In this paper, we introduce STEM-PoM, a comprehensive benchmark dataset designed to evaluate LLMs' reasoning abilities on math symbols within contextual scientific text. The dataset, sourced from real-world ArXiv documents, contains over 2K math symbols classified as main attributes of variables, constants, operators, and unit descriptors, with additional sub-attributes including scalar/vector/matrix for variables and local/global/discipline-specific labels for both constants and operators. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Mathematics, Computing, and Information Processing · Topic Modeling