Scaling Laws for Cross-Encoder Reranking
Rahul Seetharaman, Aman Bansal, Hamed Zamani, Kaustubh Dhole

TL;DR
This paper investigates how cross-encoder rerankers scale with size and data, revealing predictable power laws that aid in forecasting performance and optimizing compute allocation for large models.
Contribution
It is the first systematic study of scaling laws for rerankers, providing practical principles and forecasts for model performance and resource planning.
Findings
Ranking quality follows predictable power laws with model size and data.
Forecasts for 400M and 1B parameter rerankers are accurate and conservative.
Compute-allocation rules derived from scaling laws improve resource efficiency.
Abstract
Scaling laws are well studied for language models and first-stage retrieval, but not for reranking. We present the first systematic study of scaling laws for cross-encoder rerankers across pointwise, pairwise, and listwise objectives. Across model size and training exposure, ranking quality follows predictable power laws, enabling larger rerankers to be forecast from smaller runs. Using models up to 150M parameters, we forecast 400M and 1B rerankers on MSMARCO-dev and TREC DL. Beyond forecasting, we derive compute-allocation rules from the fitted joint scaling law and compare them with equal-compute checkpoints, showing that retrieval metrics often favor data-heavy scaling, though the recommendation depends on the training objective. The forecasts are accurate and typically conservative, making them useful for planning expensive large-model training. These results provide practical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
