SMES: Towards Scalable Multi-Task Recommendation via Expert Sparsity

Yukun Zhang; Si Dong; Xu Wang; Bo Chen; Qinglin Jia; Shengzhe Wang; Jinlong Jiao; Runhan Li; Jiaqing Liu; Chaoyi Ma; Ruiming Tang; Guorui Zhou; Han Li; Kun Gai

arXiv:2602.09386·cs.IR·February 11, 2026

SMES: Towards Scalable Multi-Task Recommendation via Expert Sparsity

Yukun Zhang, Si Dong, Xu Wang, Bo Chen, Qinglin Jia, Shengzhe Wang, Jinlong Jiao, Runhan Li, Jiaqing Liu, Chaoyi Ma, Ruiming Tang, Guorui Zhou, Han Li, Kun Gai

PDF

Open Access

TL;DR

This paper introduces SMES, a scalable sparse Mixture-of-Experts framework with progressive expert routing for multi-task recommendation, effectively balancing capacity and inference costs in large-scale industrial systems.

Contribution

The paper proposes a novel SMES framework that addresses expert activation and load balancing challenges in sparse MoE for multi-task recommendation, enabling scalable and efficient model deployment.

Findings

01

Supports over 400 million users in Kuaishou platform

02

Achieves 0.29% GAUC improvement in online experiments

03

Increases user watch time by 0.31%

Abstract

Industrial recommender systems typically rely on multi-task learning to estimate diverse user feedback signals and aggregate them for ranking. Recent advances in model scaling have shown promising gains in recommendation. However, naively increasing model capacity imposes prohibitive online inference costs and often yields diminishing returns for sparse tasks with skewed label distributions. This mismatch between uniform parameter scaling and heterogeneous task capacity demands poses a fundamental challenge for scalable multi-task recommendation. In this work, we investigate parameter sparsification as a principled scaling paradigm and identify two critical obstacles when applying sparse Mixture-of-Experts (MoE) to multi-task recommendation: exploded expert activation that undermines instance-level sparsity and expert load skew caused by independent task-wise routing. To address these…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRecommender Systems and Techniques · Mobile Crowdsensing and Crowdsourcing · Domain Adaptation and Few-Shot Learning