PRIME: A Process-Outcome Alignment Benchmark for Verifiable Reasoning in Mathematics and Engineering

Xiangfeng Wang; Hangyu Guo; Yanlin Lai; Mitt Huang; Liang Zhao; Chengyuan Yao; Yinmin Zhang; Qi Han; Xiaoxiao Ren; Chun Yuan; Tong Xu; Zheng Ge; Xiangyu Zhang; Daxin Jiang

arXiv:2602.11570·cs.CL·February 13, 2026

PRIME: A Process-Outcome Alignment Benchmark for Verifiable Reasoning in Mathematics and Engineering

Xiangfeng Wang, Hangyu Guo, Yanlin Lai, Mitt Huang, Liang Zhao, Chengyuan Yao, Yinmin Zhang, Qi Han, Xiaoxiao Ren, Chun Yuan, Tong Xu, Zheng Ge, Xiangyu Zhang, Daxin Jiang

PDF

Open Access

TL;DR

PRIME is a new benchmark designed to evaluate the ability of verifiers to check both the correctness of solutions and the correctness of their derivation processes in STEM problems, improving the reliability of reinforcement learning rewards.

Contribution

It introduces PRIME, a comprehensive benchmark for process-outcome alignment verification, and demonstrates its effectiveness in enhancing RLVR training for mathematical and engineering problems.

Findings

01

Current verifiers often miss derivation errors.

02

Process-aware RLVR training improves performance significantly.

03

Verifier accuracy on PRIME correlates strongly with RLVR training success.

Abstract

While model-based verifiers are essential for scaling Reinforcement Learning with Verifiable Rewards (RLVR), current outcome-centric verification paradigms primarily focus on the consistency between the final result and the ground truth, often neglecting potential errors in the derivation process. This leads to assigning positive rewards to correct answers produced from incorrect derivations. To bridge this gap, we introduce PRIME, a benchmark for evaluating verifiers on Process-Outcome Alignment verification in Mathematics and Engineering. Curated from a comprehensive collection of college-level STEM problems, PRIME comprises 2,530 high-difficulty samples through a consistency-based filtering pipeline. Through extensive evaluation, we find that current verifiers frequently fail to detect derivation flaws. Furthermore, we propose a process-aware RLVR training paradigm utilizing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Intelligent Tutoring Systems and Adaptive Learning · Topic Modeling