Rewarding the Scientific Process: Process-Level Reward Modeling for Agentic Data Analysis

Zhisong Qiu; Shuofei Qiao; Kewei Xu; Yuqi Zhu; Lun Du; Ningyu Zhang; Huajun Chen

arXiv:2604.24198·cs.CL·April 28, 2026

Rewarding the Scientific Process: Process-Level Reward Modeling for Agentic Data Analysis

Zhisong Qiu, Shuofei Qiao, Kewei Xu, Yuqi Zhu, Lun Du, Ningyu Zhang, Huajun Chen

PDF

1 Repo

TL;DR

This paper introduces DataPRM, a process-level reward model for data analysis agents that detects silent errors and improves policy learning, outperforming baselines with only 4B parameters.

Contribution

The work presents DataPRM, a novel environment-aware generative reward model that actively verifies intermediate states and distinguishes error types, advancing process reward modeling in dynamic data analysis.

Findings

01

DataPRM improves downstream policy LLMs by 7.21% on ScienceAgentBench.

02

DataPRM achieves 11.28% improvement on DABStep with Best-of-N inference.

03

DataPRM outperforms strong baselines with only 4B parameters and generalizes across strategies.

Abstract

Process Reward Models (PRMs) have achieved remarkable success in augmenting the reasoning capabilities of Large Language Models (LLMs) within static domains such as mathematics. However, their potential in dynamic data analysis tasks remains underexplored. In this work, we first present a empirical study revealing that general-domain PRMs struggle to supervise data analysis agents. Specifically, they fail to detect silent errors, logical flaws that yield incorrect results without triggering interpreter exceptions, and erroneously penalize exploratory actions, mistaking necessary trial-and-error exploration for grounding failures. To bridge this gap, we introduce DataPRM, a novel environment-aware generative process reward model that (1) can serve as an active verifier, autonomously interacting with the environment to probe intermediate execution states and uncover silent errors, and (2)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zjunlp/DataMind
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.