Process Supervision for Chain-of-Thought Reasoning via Monte Carlo Net Information Gain

Corentin Royer; Debarun Bhattacharjya; Gaetano Rossiello; Andrea Giovannini; and Mennatallah El-Assady

arXiv:2603.17815·cs.CL·March 19, 2026

Process Supervision for Chain-of-Thought Reasoning via Monte Carlo Net Information Gain

Corentin Royer, Debarun Bhattacharjya, Gaetano Rossiello, Andrea Giovannini, and Mennatallah El-Assady

PDF

Open Access

TL;DR

This paper introduces a scalable, information-theoretic method for automatically generating step-level supervision signals for chain-of-thought reasoning in large language models, improving reliability and efficiency.

Contribution

It proposes a novel, computationally efficient approach to generate step-level labels using Information Theory, reducing complexity from O(N log N) to O(N).

Findings

01

Effective chain-of-thought selection across diverse benchmarks.

02

Improved supervision reduces error propagation in multi-step reasoning.

03

Method is scalable and reduces reliance on costly annotations.

Abstract

Multi-step reasoning improves the capabilities of large language models (LLMs) but increases the risk of errors propagating through intermediate steps. Process reward models (PRMs) mitigate this by scoring each step individually, enabling fine-grained supervision and improved reliability. Existing methods for training PRMs rely on costly human annotations or computationally intensive automatic labeling. We propose a novel approach to automatically generate step-level labels using Information Theory. Our method estimates how each reasoning step affects the likelihood of the correct answer, providing a signal of step quality. Importantly, it reduces computational complexity to $O (N)$ , improving over the previous $O (N lo g N)$ methods. We demonstrate that these labels enable effective chain-of-thought selection in best-of- $K$ evaluation settings across diverse reasoning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Machine Learning in Materials Science · Explainable Artificial Intelligence (XAI)