Peer-Predictive Self-Training for Language Model Reasoning

Shi Feng; Hanlin Zhang; Fan Nie; Sham Kakade; and Yiling Chen

arXiv:2604.13356·cs.CL·April 28, 2026

Peer-Predictive Self-Training for Language Model Reasoning

Shi Feng, Hanlin Zhang, Fan Nie, Sham Kakade, and Yiling Chen

PDF

TL;DR

This paper introduces Peer-Predictive Self-Training (PST), a collaborative, label-free fine-tuning method where multiple language models improve through internal feedback without external supervision.

Contribution

PST leverages cross-model aggregated responses and mutual information to enhance self-training, improving reasoning accuracy and reducing the generator-verifier gap without external labels.

Findings

01

PST improves exact-match accuracy by 2.2 to 4.3 percentage points.

02

PST reduces the generator-verifier gap by 26 to 40%.

03

PST requires no external supervision, relying solely on cross-model interactions.

Abstract

Mechanisms for continued self-improvement of language models without external supervision remain an open challenge. We propose Peer-Predictive Self-Training (PST), a label-free fine-tuning framework in which multiple language models improve collaboratively by leveraging a cross-model aggregated response as an internal training signal. Given a prompt question, the models generate responses sequentially; the final aggregated answer, often more reliable than individual responses in practice, serves as an internal target for learning. We measure how informative each intermediate response is about the aggregate using pointwise mutual information (PMI), and use this signal to scale self-training updates. Responses already aligned with the aggregate are updated less, while less informative or misaligned responses are updated more. On mathematical reasoning benchmarks (SimulEq, Math500, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.