Let's Verify Math Questions Step by Step
Chengyu Shen, Zhen Hao Wong, Runming He, Hao Liang, Meiyi Qiang, Zimo Meng, Zhengyang Zhao, Bohan Zeng, Zhengzhou Zhu, Bin Cui, Wentao Zhang

TL;DR
This paper introduces ValiMath, a high-quality benchmark of 2147 verified math questions, and MathQ-Verify, a pipeline for parsing and verifying question correctness to improve dataset quality for LLM training.
Contribution
The paper presents ValiMath as a new benchmark for question correctness and introduces MathQ-Verify, a novel method for fine-grained parsing and semantic verification of math questions.
Findings
ValiMath provides a reliable gold-standard dataset for math question evaluation.
MathQ-Verify achieves state-of-the-art accuracy in question verification tasks.
The pipeline significantly reduces noise in mathematical datasets, enhancing LLM training quality.
Abstract
Large Language Models (LLMs) have recently achieved remarkable progress in mathematical reasoning. To enable such capabilities, many existing works distill strong reasoning models into long chains of thought or design algorithms to construct high-quality math question-answer (QA) data for training. However, these efforts primarily focus on generating correct reasoning paths and answers, while largely overlooking the correctness of the questions themselves. In this work, we present ValiMath, a benchmark consisting of 2147 human-verified mathematical questions covering a wide range of domains such as arithmetic, algebra, and geometry, which are synthesized and curated from the NuminaMath dataset. Each question is annotated with its logical structure, domain coverage, and question correctness, enabling fine-grained evaluation of question quality. ValiMath serves as a high-quality…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMathematics, Computing, and Information Processing · Topic Modeling · Natural Language Processing Techniques
MethodsFocus
