Probabilistic Soundness Guarantees in LLM Reasoning Chains

Weiqiu You; Anton Xue; Shreya Havaldar; Delip Rao; Helen Jin; Chris Callison-Burch; Eric Wong

arXiv:2507.12948·cs.LG·September 30, 2025

Probabilistic Soundness Guarantees in LLM Reasoning Chains

Weiqiu You, Anton Xue, Shreya Havaldar, Delip Rao, Helen Jin, Chris Callison-Burch, Eric Wong

PDF

Open Access 1 Video

TL;DR

This paper introduces ARES, a probabilistic framework for evaluating reasoning steps in LLMs, providing statistical soundness guarantees and improving error detection in reasoning chains.

Contribution

The paper presents ARES, a novel probabilistic method that assesses reasoning steps in LLMs with certified guarantees, enhancing error detection over previous approaches.

Findings

01

ARES achieves 72.1% Macro-F1 on four benchmarks.

02

It detects propagated errors with 90.3% F1 on synthetic chains.

03

Outperforms existing methods in robustness and accuracy.

Abstract

In reasoning chains generated by large language models (LLMs), initial errors often propagate and undermine the reliability of the final conclusion. Current LLM-based error detection methods often fail to detect propagated errors because earlier errors can corrupt judgments of downstream reasoning. To better detect such errors, we introduce Autoregressive Reasoning Entailment Stability (ARES), a probabilistic framework that evaluates each reasoning step based solely on previously-verified premises. This inductive method yields a nuanced score for each step and provides certified statistical guarantees of its soundness, rather than a brittle binary label. ARES achieves state-of-the-art performance across four benchmarks (72.1% Macro-F1, +8.2 points) and demonstrates superior robustness on very long synthetic reasoning chains, where it excels at detecting propagated errors (90.3% F1,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Probabilistic Soundness Guarantees in LLM Reasoning Chains· underline

Taxonomy

TopicsSemantic Web and Ontologies · Multi-Agent Systems and Negotiation · Service-Oriented Architecture and Web Services