Autonomy-of-Experts Models
Ang Lv, Ruobing Xie, Yining Qian, Songhao Wu, Xingwu Sun, Zhanhui Kang, Di Wang, Rui Yan

TL;DR
The paper introduces Autonomy-of-Experts, a new MoE paradigm where experts self-select based on activation norms, removing routers and improving expert selection and learning efficiency in language models.
Contribution
It proposes a novel MoE approach where experts autonomously select themselves, enhancing expert utilization and learning without relying on a router.
Findings
AoE outperforms traditional MoE models in language tasks.
Pre-trained models with 700M to 4B parameters show improved performance.
The approach reduces routing overhead through low-rank activation pre-computation.
Abstract
Mixture-of-Experts (MoE) models mostly use a router to assign tokens to specific expert modules, activating only partial parameters and often outperforming dense models. We argue that the separation between the router's decision-making and the experts' execution is a critical yet overlooked issue, leading to suboptimal expert selection and ineffective learning. To address this, we propose Autonomy-of-Experts (AoE), a novel MoE paradigm in which experts autonomously select themselves to process inputs. AoE is based on the insight that an expert is aware of its own capacity to effectively process a token, an awareness reflected in the scale of its internal activations. In AoE, routers are removed; instead, experts pre-compute internal activations for inputs and are ranked based on their activation norms. Only the top-ranking experts proceed with the forward pass, while the others abort.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsEthics and Social Impacts of AI · Risk Perception and Management
MethodsMixture of Experts · Attentive Walk-Aggregating Graph Neural Network
