HoME: Hierarchy of Multi-Gate Experts for Multi-Task Learning at Kuaishou
Xu Wang, Jiangxia Cao, Zhiyi Fu, Kun Gai, Guorui Zhou

TL;DR
This paper introduces HoME, a novel multi-gate expert framework designed to address common issues in industry-scale multi-task learning with Mixture-of-Experts, improving balance and performance across tasks.
Contribution
The paper proposes HoME, a new multi-gate expert architecture that mitigates expert collapse, degradation, and underfitting in multi-task MoE models for large-scale industry applications.
Findings
HoME effectively reduces expert collapse and underfitting.
It improves task performance balance across multiple tasks.
The framework demonstrates practical efficiency in industry settings.
Abstract
In this paper, we present the practical problems and the lessons learned at short-video services from Kuaishou. In industry, a widely-used multi-task framework is the Mixture-of-Experts (MoE) paradigm, which always introduces some shared and specific experts for each task and then uses gate networks to measure related experts' contributions. Although the MoE achieves remarkable improvements, we still observe three anomalies that seriously affect model performances in our iteration: (1) Expert Collapse: We found that experts' output distributions are significantly different, and some experts have over 90% zero activations with ReLU, making it hard for gate networks to assign fair weights to balance experts. (2) Expert Degradation: Ideally, the shared-expert aims to provide predictive information for all tasks simultaneously. Nevertheless, we find that some shared-experts are occupied by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Mixture of Experts
