PRISM: A Unified Framework for Post-Training LLMs Without Verifiable Rewards

Mukesh Ghimire; Aosong Feng; Liwen You; Youzhi Luo; Fang Liu; Xuan Zhu

arXiv:2601.04700·cs.CL·January 21, 2026

PRISM: A Unified Framework for Post-Training LLMs Without Verifiable Rewards

Mukesh Ghimire, Aosong Feng, Liwen You, Youzhi Luo, Fang Liu, Xuan Zhu

PDF

Open Access

TL;DR

PRISM introduces a unified post-training framework for LLMs that combines a Process Reward Model with internal confidence metrics to improve training stability and performance without relying on external labels or human supervision.

Contribution

The paper proposes PRISM, a novel framework that effectively integrates a Process Reward Model with internal confidence to guide LLM training without ground-truth labels.

Findings

01

PRISM achieves more stable training compared to existing methods.

02

Combining PRM with self-certainty improves test performance.

03

PRISM reduces reliance on costly human supervision.

Abstract

Current techniques for post-training Large Language Models (LLMs) rely either on costly human supervision or on external verifiers to boost performance on tasks such as mathematical reasoning and code generation. However, as LLMs improve their problem-solving, any further improvement will potentially require high-quality solutions to difficult problems that are not available to humans. As a result, learning from unlabeled data is becoming increasingly attractive in the research community. Existing methods extract learning signal from a model's consistency, either by majority voting or by converting the model's internal confidence into reward. Although internal consistency metric such as entropy or self-certainty require no human intervention, as we show in this work, these are unreliable signals for large-scale and long-term training. To address the unreliability, we propose PRISM, a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Computational and Text Analysis Methods · Artificial Intelligence in Healthcare and Education