Demystifying LLM-as-a-Judge: Analytically Tractable Model for Inference-Time Scaling

Indranil Halder; Cengiz Pehlevan

arXiv:2512.19905·cs.LG·February 13, 2026

Demystifying LLM-as-a-Judge: Analytically Tractable Model for Inference-Time Scaling

Indranil Halder, Cengiz Pehlevan

PDF

Open Access

TL;DR

This paper introduces an analytically tractable model for understanding inference-time scaling in large language models, revealing how sampling strategies and reward misspecification affect generalization error and optimal inference-time computation.

Contribution

The paper provides a theoretical framework for inference-time scaling in LLMs using Bayesian linear regression, analyzing the effects of sampling and reward misspecification on generalization.

Findings

01

Generalization error decreases as 1/k^2 with optimal reward and sampling temperature.

02

Reward misspecification can lead to a finite optimal number of inference samples.

03

Inference-time compute advantage diminishes with increasing task difficulty.

Abstract

Recent developments in large language models have shown advantages in reallocating a notable share of computational resource from training time to inference time. However, the principles behind inference time scaling are not well understood. In this paper, we introduce an analytically tractable model of inference-time scaling: Bayesian linear regression with a reward-weighted sampler, where the reward is determined from a linear model, modeling LLM-as-a-judge scenario. We study this problem in the high-dimensional regime, where the deterministic equivalents dictate a closed-form expression for the posterior predictive mean and variance. We analyze the generalization error when training data are sampled from a teacher model. We draw $k$ inference-time samples and select via softmax at a temperature applied to a quadratic reward. When the reward is not too different from the teacher, the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Machine Learning in Materials Science