A Single Revision Step Improves Token-Efficient LLM Reasoning
Yingchuan Zhang, Terry Ma, Wenxuan Zhong, Ping Ma

TL;DR
This paper introduces PACER, a novel inference framework that allows reasoning traces of large language models to peer-review and revise their conclusions, significantly improving accuracy on challenging math benchmarks.
Contribution
PACER enables inference-time revision of reasoning traces through structured peer review, improving over traditional voting methods without additional training.
Findings
PACER matches or exceeds 256-sample majority voting accuracy.
PACER significantly outperforms raw ensemble baselines.
The method enhances reasoning accuracy on math benchmarks like AIME and BRUMO.
Abstract
Large language models (LLMs) achieve higher accuracy on challenging reasoning tasks by scaling test-time compute through multiple trajectory sampling. However, standard aggregation methods like majority voting or individual confidence-based filtering face a fundamental "blind spot": they evaluate each trace in isolation. As problems scale in difficulty, models often generate hallucinated paths that exhibit misleadingly high confidence, causing the true solution to be suppressed by a narrow margin in traditional voting. We ask: can we enable traces to "peer-review" each other to resolve these near-miss errors? We introduce Packet-Conditioned Revision (PACER), a training-free, inference-only framework that enables reasoning traces to revise their conclusions through a structured coordination step. After a preliminary screening of generated traces, PACER constructs a compact consensus…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Advanced Graph Neural Networks · Natural Language Processing Techniques
