Process Supervision for Chain-of-Thought Reasoning via Monte Carlo Net Information Gain
Corentin Royer, Debarun Bhattacharjya, Gaetano Rossiello, Andrea Giovannini, and Mennatallah El-Assady

TL;DR
This paper introduces a scalable, information-theoretic method for automatically generating step-level supervision signals for chain-of-thought reasoning in large language models, improving reliability and efficiency.
Contribution
It proposes a novel, computationally efficient approach to generate step-level labels using Information Theory, reducing complexity from O(N log N) to O(N).
Findings
Effective chain-of-thought selection across diverse benchmarks.
Improved supervision reduces error propagation in multi-step reasoning.
Method is scalable and reduces reliance on costly annotations.
Abstract
Multi-step reasoning improves the capabilities of large language models (LLMs) but increases the risk of errors propagating through intermediate steps. Process reward models (PRMs) mitigate this by scoring each step individually, enabling fine-grained supervision and improved reliability. Existing methods for training PRMs rely on costly human annotations or computationally intensive automatic labeling. We propose a novel approach to automatically generate step-level labels using Information Theory. Our method estimates how each reasoning step affects the likelihood of the correct answer, providing a signal of step quality. Importantly, it reduces computational complexity to , improving over the previous methods. We demonstrate that these labels enable effective chain-of-thought selection in best-of- evaluation settings across diverse reasoning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Machine Learning in Materials Science · Explainable Artificial Intelligence (XAI)
