RPRA: Predicting an LLM-Judge for Efficient but Performant Inference

Dylan R. Ashley; Ga\"el Le Lan; Changsheng Zhao; Naina Dhingra; Zhipeng Cai; Ernie Chang; Mingchen Zhuge; Yangyang Shi; Vikas Chandra; J\"urgen Schmidhuber

arXiv:2604.12634·cs.AI·April 15, 2026

RPRA: Predicting an LLM-Judge for Efficient but Performant Inference

Dylan R. Ashley, Ga\"el Le Lan, Changsheng Zhao, Naina Dhingra, Zhipeng Cai, Ernie Chang, Mingchen Zhuge, Yangyang Shi, Vikas Chandra, J\"urgen Schmidhuber

PDF

TL;DR

This paper explores methods for smaller language models to predict their own performance using larger model judges, enabling more efficient and accurate responses on limited devices.

Contribution

It introduces the RPRA paradigm and evaluates approaches for models to predict LLM judges, improving smaller models' self-assessment capabilities.

Findings

01

Larger models perform well as judges in zero-shot settings.

02

Smaller models improve prediction accuracy after fine-tuning or report cards.

03

Report cards and fine-tuning significantly enhance prediction accuracy.

Abstract

Large language models (LLMs) face a fundamental trade-off between computational efficiency (e.g., number of parameters) and output quality, especially when deployed on computationally limited devices such as phones or laptops. One way to address this challenge is by following the example of humans and have models ask for help when they believe they are incapable of solving a problem on their own; we can overcome this trade-off by allowing smaller models to respond to queries when they believe they can provide good responses, and deferring to larger models when they do not believe they can. To this end, in this paper, we investigate the viability of Predict-Answer/Act (PA) and Reason-Predict-Reason-Answer/Act (RPRA) paradigms where models predict -- prior to responding -- how an LLM judge would score their output. We evaluate three approaches: zero-shot prediction, prediction using an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.