Scaling Laws for Online Advertisement Retrieval
Yunli Wang, Zhen Zhang, Zixuan Yang, Tianyu Xu, Zhiqiang Wang, Yu Li, Rufan Zhou, Zhiqiang Liu, Yanjie Zhu, Jian Yang, Shiyang Wen, Peng Jiang

TL;DR
This paper introduces a lightweight offline method to identify and apply scaling laws in online advertisement retrieval systems, enabling efficient model design and resource allocation without extensive online experimentation.
Contribution
It proposes a novel offline metric and simulation algorithm to discover online scaling laws, validated across multiple architectures and practical advertising scenarios.
Findings
The offline metric correlates strongly with online revenue.
Scaling laws are consistent across different model architectures.
The method enables rapid offline estimation of costs and revenues.
Abstract
The scaling law is a notable property of neural network models and has significantly propelled the development of large language models. Scaling laws hold great promise in guiding model design and resource allocation. Recent research increasingly shows that scaling laws are not limited to NLP tasks or Transformer architectures; they also apply to domains such as recommendation. However, there is still a lack of literature on scaling law research in online advertisement retrieval systems. This may be because 1) identifying the scaling law for resource cost and online revenue is often expensive in both time and training resources for industrial applications, and 2) varying settings for different systems prevent the scaling law from being applied across various scenarios. To address these issues, we propose a lightweight paradigm to identify online scaling laws of retrieval models,…
Peer Reviews
Decision·Submitted to ICLR 2026
1. The paper is well structured and the writing is clean and easy to follow 2. The proposed methodology for establishing scaling laws is well principled 3. The paper presents proper justifications for key design choices taken
1. Closed evaluation - I acknowledge that the nature of the problem tackled in the paper makes it difficult to present these results in an open setting but nonetheless because of all experiments being closed source/data makes it impossible to replicate or reproduce. 2. Scope of the results - I am unsure if the results presented in the paper hold in other settings or is of interest to the ICLR community which as per my understanding focuses more on learning algorithms, architecture or a better un
High Practical Impact and Significance: The paper tackles a critical and expensive problem for any large-scale industrial ML Retrieval system: how to perform cost-aware model development and resource allocation. The ability to accurately estimate the ROI of a model configuration offline is extremely valuable. The reported +5.1% online revenue gain from applying this framework is a very strong testament to its practical utility. Holistic and Well-Designed Framework: The authors present a complet
Limited Conceptual Novelty: The R/R* metric is functionally a "revenue-weighted recall." Can the authors comment on the novelty of this metric in the context of prior work on utility-based or business-value-weighted metrics in recommender systems and information retrieval? The paper's strength seems to be its empirical validation rather than the novelty of the metric's formulation. The contribution is more of an engineering one—successfully applying and validating this known concept in a new dom
As real-world advertising systems become increasingly complex and large-scale, it is important to make changes that comply with ROI constraints. This paper proposes a heuristic scaling law to study the trade-off between cost and return. Overall, the paper is interesting and useful.
My main concern is that the writing could be substantially improved, as the current version of the paper is not accessible to the general ICLR audience. 1. The key concept $R / R^*$ is mentioned several times in the Introduction and Abstract, but it is never explained, even heuristically. 2. The definition of $R / R^*$ in Equation (2) is difficult to understand. What is the “hard permutation matrix”? It should be clearly defined, and a heuristic explanation would be helpful. 3. How is the groun
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsConsumer Market Behavior and Pricing
MethodsAttention Is All You Need · Dense Connections · Label Smoothing · Adam · Residual Connection · Byte Pair Encoding · Linear Layer · Softmax · Position-Wise Feed-Forward Layer · Layer Normalization
