Process Supervision via Verbal Critique Improves Reasoning in Large Language Models

Hao-Yuan Chen

arXiv:2604.21611·cs.CL·April 24, 2026

Process Supervision via Verbal Critique Improves Reasoning in Large Language Models

Hao-Yuan Chen

PDF

TL;DR

Introducing Verbal Process Supervision (VPS), a training-free method that uses structured natural-language critique to significantly enhance reasoning in large language models during inference.

Contribution

VPS adds a new axis of inference-time scaling by leveraging verbal critique, outperforming existing methods across multiple benchmarks without gradient updates.

Findings

01

VPS achieves 94.9% accuracy on GPQA Diamond at R=4, surpassing previous state-of-the-art.

02

VPS boosts AIME 2025 scores from 11.7-26.7% to 63.3-90.0%, demonstrating strong rescue capabilities.

03

VPS outperforms Reflexion and Self-Consistency at matched compute, highlighting critique granularity as a key factor.

Abstract

Inference-time scaling for LLM reasoning has focused on three axes: chain depth, sample breadth, and learned step-scorers (PRMs). We introduce a fourth axis, granularity of external verbal supervision, via Verbal Process Supervision (VPS), a training-free framework that uses structured natural-language critique from a stronger supervisor to guide an iterative generate-critique-refine loop up to a round budget R. Across GPQA Diamond, AIME 2025, and LiveCodeBench V6 (covering both closed and open models), VPS yields three key results. First, on GPQA Diamond, GPT-5.4 (High) | GPT-5.4 (Low) reaches 94.9% at R=4, surpassing the 94.1% state of the art without gradient updates. Second, on AIME 2025, VPS enables strong weak-actor rescue, boosting scores from 11.7-26.7% to 63.3-90.0% (up to +63.3 points). Third, at matched compute, VPS outperforms Reflexion by +8.5 to +12.1 points and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.