SCATR: Simple Calibrated Test-Time Ranking

Divya Shyamal; Marta Kne\v{z}evi\'c; Lan Tran; Chanakya Ekbote; Vijay Lingam; Paul Pu Liang

arXiv:2604.16535·cs.LG·April 22, 2026

SCATR: Simple Calibrated Test-Time Ranking

Divya Shyamal, Marta Kne\v{z}evi\'c, Lan Tran, Chanakya Ekbote, Vijay Lingam, Paul Pu Liang

PDF

TL;DR

SCATR is an efficient method that learns a lightweight scorer from a small calibration set to improve test-time ranking of large language models, offering a strong accuracy-efficiency balance.

Contribution

Introduces SCATR, a simple, calibration-based ranking method that rivals learned scorers with significantly less training and inference cost.

Findings

01

SCATR improves confidence heuristics by up to 9% on benchmarks.

02

Achieves comparable accuracy to LoRA fine-tuning with 8000x fewer parameters.

03

Reduces training and inference latency by up to 150x and 1000x, respectively.

Abstract

Test-time scaling (TTS) improves large language models (LLMs) by allocating additional compute at inference time. In practice, TTS is often achieved through parallel scaling: generating multiple candidate responses and selecting the best via a Best-of-N (BoN) strategy. Its effectiveness therefore hinges on the scoring function. Learned scorers such as process reward models (PRMs) can be strong, but they are expensive to train and run. Lightweight confidence heuristics based on token log-probabilities are much cheaper, yet we find that they often perform substantially worse. To improve on lightweight confidence heuristics without incurring the full cost of stronger learned scorers, we introduce SCATR, a simple and efficient BoN ranking method that learns a lightweight scorer from a small calibration set using hidden representations from the base model. Across coding and mathematical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.