Realizing Scaling Laws in Recommender Systems: A Foundation-Expert Paradigm for Hyperscale Model Deployment

Dai Li; Kevin Course; Wei Li; Hongwei Li; Jie Hua; Yiqi Chen; Zhao Zhu; Rui Jian; Xuan Cao; Bi Xue; Yu Shi; Jing Qian; Kai Ren; Matt Ma; Qunshu Zhang; Rui Li

arXiv:2508.02929·cs.IR·August 8, 2025

Realizing Scaling Laws in Recommender Systems: A Foundation-Expert Paradigm for Hyperscale Model Deployment

Dai Li, Kevin Course, Wei Li, Hongwei Li, Jie Hua, Yiqi Chen, Zhao Zhu, Rui Jian, Xuan Cao, Bi Xue, Yu Shi, Jing Qian, Kai Ren, Matt Ma, Qunshu Zhang, Rui Li

PDF

TL;DR

This paper introduces a Foundation-Expert paradigm for hyperscale recommender systems, leveraging a central foundation model to efficiently adapt to diverse recommendation tasks, demonstrated by deployment at Meta with significant performance and efficiency gains.

Contribution

The paper presents a novel framework and infrastructure for deploying hyperscale recommendation models, enabling scalable, efficient adaptation across multiple recommendation surfaces.

Findings

01

Deployed at Meta serving tens of billions of requests daily

02

Achieved online metric improvements over previous systems

03

Enhanced developer velocity and infrastructure efficiency

Abstract

While scaling laws promise significant performance gains for recommender systems, efficiently deploying hyperscale models remains a major unsolved challenge. In contrast to fields where FMs are already widely adopted such as natural language processing and computer vision, progress in recommender systems is hindered by unique challenges including the need to learn from online streaming data under shifting data distributions, the need to adapt to different recommendation surfaces with a wide diversity in their downstream tasks and their input distributions, and stringent latency and computational constraints. To bridge this gap, we propose to leverage the Foundation-Expert Paradigm: a framework designed for the development and deployment of hyperscale recommendation FMs. In our approach, a central FM is trained on lifelong, cross-surface, multi-modal user data to learn generalizable…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.