Kunlun: Establishing Scaling Laws for Massive-Scale Recommendation Systems through Unified Architecture Design

Bojian Hou; Xiaolong Liu; Xiaoyi Liu; Jiaqi Xu; Yasmine Badr; Mengyue Hang; Sudhanshu Chanpuriya; Junqing Zhou; Yuhang Yang; Han Xu; Qiuling Suo; Laming Chen; Yuxi Hu; Jiasheng Zhang; Huaqing Xiong; Yuzhen Huang; Chao Chen; Yue Dong; Yi Yang; Shuo Chang; Xiaorui Gan; Wenlin Chen; Santanu Kolay; Darren Liu; Jade Nie; Chunzhi Yang; Ellie Wen; Jiyan Yang; Huayu Li

arXiv:2602.10016·cs.IR·February 17, 2026

Kunlun: Establishing Scaling Laws for Massive-Scale Recommendation Systems through Unified Architecture Design

Bojian Hou, Xiaolong Liu, Xiaoyi Liu, Jiaqi Xu, Yasmine Badr, Mengyue Hang, Sudhanshu Chanpuriya, Junqing Zhou, Yuhang Yang, Han Xu, Qiuling Suo, Laming Chen, Yuxi Hu, Jiasheng Zhang, Huaqing Xiong, Yuzhen Huang, Chao Chen, Yue Dong, Yi Yang, Shuo Chang, Xiaorui Gan, Wenlin Chen

PDF

Open Access

TL;DR

Kunlun is a scalable architecture for recommendation systems that improves efficiency and resource utilization, enabling predictable scaling laws and significant performance gains in large-scale industrial applications.

Contribution

The paper introduces Kunlun, a unified architecture with novel low- and high-level optimizations that enhance scaling efficiency and model performance for recommendation systems.

Findings

01

Increased Model FLOPs Utilization from 17% to 37%.

02

Doubled scaling efficiency compared to previous methods.

03

Deployed in Meta Ads with significant production impact.

Abstract

Deriving predictable scaling laws that govern the relationship between model performance and computational investment is crucial for designing and allocating resources in massive-scale recommendation systems. While such laws are established for large language models, they remain challenging for recommendation systems, especially those processing both user history and context features. We identify poor scaling efficiency as the main barrier to predictable power-law scaling, stemming from inefficient modules with low Model FLOPs Utilization (MFU) and suboptimal resource allocation. We introduce Kunlun, a scalable architecture that systematically improves model efficiency and resource allocation. Our low-level optimizations include Generalized Dot-Product Attention (GDPA), Hierarchical Seed Pooling (HSP), and Sliding Window Attention. Our high-level innovations feature Computation Skip…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRecommender Systems and Techniques · Big Data and Digital Economy · Explainable Artificial Intelligence (XAI)