RankMixer: Scaling Up Ranking Models in Industrial Recommenders

Jie Zhu; Zhifang Fan; Xiaoxie Zhu; Yuchen Jiang; Hangyu Wang; Xintian Han; Haoran Ding; Xinmin Wang; Wenlin Zhao; Zhen Gong; Huizhi Yang; Zheng Chai; Zhe Chen; Yuchao Zheng; Qiwei Chen; Feng Zhang; Xun Zhou; Peng Xu; Xiao Yang; Di Wu; Zuotao Liu

arXiv:2507.15551·cs.IR·July 29, 2025

RankMixer: Scaling Up Ranking Models in Industrial Recommenders

Jie Zhu, Zhifang Fan, Xiaoxie Zhu, Yuchen Jiang, Hangyu Wang, Xintian Han, Haoran Ding, Xinmin Wang, Wenlin Zhao, Zhen Gong, Huizhi Yang, Zheng Chai, Zhe Chen, Yuchao Zheng, Qiwei Chen, Feng Zhang, Xun Zhou, Peng Xu, Xiao Yang, Di Wu, Zuotao Liu

PDF

TL;DR

RankMixer is a scalable, hardware-aware ranking model architecture that significantly improves efficiency and scalability for industrial recommender systems, enabling billion-parameter models without increasing latency.

Contribution

It introduces a novel GPU-efficient feature-interaction architecture with a multi-head token mixing module and extends to billion-parameter models with Sparse-MoE, demonstrating superior scaling and efficiency.

Findings

01

Boosts model MFU from 4.5% to 45%.

02

Scales parameters by 100x with similar latency.

03

Improves user engagement metrics in online tests.

Abstract

Recent progress on large language models (LLMs) has spurred interest in scaling up recommendation systems, yet two practical obstacles remain. First, training and serving cost on industrial Recommenders must respect strict latency bounds and high QPS demands. Second, most human-designed feature-crossing modules in ranking models were inherited from the CPU era and fail to exploit modern GPUs, resulting in low Model Flops Utilization (MFU) and poor scalability. We introduce RankMixer, a hardware-aware model design tailored towards a unified and scalable feature-interaction architecture. RankMixer retains the transformer's high parallelism while replacing quadratic self-attention with multi-head token mixing module for higher efficiency. Besides, RankMixer maintains both the modeling for distinct feature subspaces and cross-feature-space interactions with Per-token FFNs. We further extend…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.