SMES: Towards Scalable Multi-Task Recommendation via Expert Sparsity
Yukun Zhang, Si Dong, Xu Wang, Bo Chen, Qinglin Jia, Shengzhe Wang, Jinlong Jiao, Runhan Li, Jiaqing Liu, Chaoyi Ma, Ruiming Tang, Guorui Zhou, Han Li, Kun Gai

TL;DR
This paper introduces SMES, a scalable sparse Mixture-of-Experts framework with progressive expert routing for multi-task recommendation, effectively balancing capacity and inference costs in large-scale industrial systems.
Contribution
The paper proposes a novel SMES framework that addresses expert activation and load balancing challenges in sparse MoE for multi-task recommendation, enabling scalable and efficient model deployment.
Findings
Supports over 400 million users in Kuaishou platform
Achieves 0.29% GAUC improvement in online experiments
Increases user watch time by 0.31%
Abstract
Industrial recommender systems typically rely on multi-task learning to estimate diverse user feedback signals and aggregate them for ranking. Recent advances in model scaling have shown promising gains in recommendation. However, naively increasing model capacity imposes prohibitive online inference costs and often yields diminishing returns for sparse tasks with skewed label distributions. This mismatch between uniform parameter scaling and heterogeneous task capacity demands poses a fundamental challenge for scalable multi-task recommendation. In this work, we investigate parameter sparsification as a principled scaling paradigm and identify two critical obstacles when applying sparse Mixture-of-Experts (MoE) to multi-task recommendation: exploded expert activation that undermines instance-level sparsity and expert load skew caused by independent task-wise routing. To address these…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRecommender Systems and Techniques · Mobile Crowdsensing and Crowdsourcing · Domain Adaptation and Few-Shot Learning
