Forecasting Downstream Performance of LLMs With Proxy Metrics

Arkil Patel; Siva Reddy; Marius Mosbach; Dzmitry Bahdanau

arXiv:2605.18607·cs.CL·May 19, 2026

Forecasting Downstream Performance of LLMs With Proxy Metrics

Arkil Patel, Siva Reddy, Marius Mosbach, Dzmitry Bahdanau

PDF

1 Repo

TL;DR

This paper introduces proxy metrics based on token-level statistics to reliably forecast downstream performance of language models, outperforming traditional signals across various model development tasks.

Contribution

It proposes a novel approach using token-level proxy metrics derived from expert solutions to improve performance forecasting during language model development.

Findings

01

Proxy metrics outperform loss- and compute-based baselines in model ranking.

02

Efficiently rank candidate corpora for pretraining with 10,000x less compute.

03

Forecast downstream accuracy with half the error of existing methods.

Abstract

Progress in language model development is often driven by comparative decisions: which architecture to adopt, which pretraining corpus to use, or which training recipe to apply. Making these decisions well requires reliable performance forecasts, yet the two commonly used signals are fundamentally limited. Cross-entropy loss is poorly aligned with downstream capabilities, and direct downstream evaluation is expensive, sparse, and often uninformative at early training stages. Instead, we propose to construct proxy metrics by aggregating token-level statistics, such as entropy, top-k accuracy, and expert token rank, from a candidate model's next token distribution over expert-written solutions. Across three settings, our proxies consistently outperform loss- and compute-based baselines: 1) For cross-family model selection, they rank a heterogeneous population of reasoning models with mean…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mcgill-nlp/proxy-metrics
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.