RankUp: Towards High-rank Representations for Large Scale Advertising Recommender Systems
Jin Chen, Shangyu Zhang, Bin Hu, Chao Zhou, Junwei Pan, Gengsheng Xue, Wentao Ning, Gengyu Weng, Wang Zheng, Shaohua Liu, Zeen Xu, Chengyuan Mai, Shijie Quan, Tingyu Jiang, Lifeng Wang, Shudong Huang, Chengguo Yin, Haijie Gu, Jie Jiang

TL;DR
RankUp introduces a novel architecture to improve representation capacity in large-scale recommender systems, addressing representation collapse and achieving significant GMV improvements in production.
Contribution
The paper proposes RankUp, a new architecture with permutation splitting, multi-embedding, and token integration to enhance expressive capacity in recommender systems.
Findings
RankUp improves GMV by up to 4.81% in production.
Representation capacity scales better with depth using RankUp.
RankMixer's effective rank oscillates and degrades in deep layers, motivating RankUp.
Abstract
The scaling laws for recommender systems have been increasingly validated, where MetaFormer-based architectures consistently benefit from increased model depth, hidden dimensionality, and user behavior sequence length. However, whether representation capacity scales proportionally with parameter growth remains unexplored. Prior studies on RankMixer reveal that the effective rank of token representations exhibits a damped oscillatory trajectory across layers, failing to increase consistently with depth and even degrading in deeper layers. Motivated by this observation, we propose RankUp, an architecture designed to mitigate representation collapse and enhance expressive capacity through randomized permutation splitting over sparse features, a multi-embedding paradigm, global token integration and crossed pretrained embedding tokens. RankUp has been fully deployed in large-scale…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
