Autonomy-of-Experts Models

Ang Lv; Ruobing Xie; Yining Qian; Songhao Wu; Xingwu Sun; Zhanhui Kang; Di Wang; Rui Yan

arXiv:2501.13074·cs.CL·June 2, 2025

Autonomy-of-Experts Models

Ang Lv, Ruobing Xie, Yining Qian, Songhao Wu, Xingwu Sun, Zhanhui Kang, Di Wang, Rui Yan

PDF

Open Access 1 Video

TL;DR

The paper introduces Autonomy-of-Experts, a new MoE paradigm where experts self-select based on activation norms, removing routers and improving expert selection and learning efficiency in language models.

Contribution

It proposes a novel MoE approach where experts autonomously select themselves, enhancing expert utilization and learning without relying on a router.

Findings

01

AoE outperforms traditional MoE models in language tasks.

02

Pre-trained models with 700M to 4B parameters show improved performance.

03

The approach reduces routing overhead through low-rank activation pre-computation.

Abstract

Mixture-of-Experts (MoE) models mostly use a router to assign tokens to specific expert modules, activating only partial parameters and often outperforming dense models. We argue that the separation between the router's decision-making and the experts' execution is a critical yet overlooked issue, leading to suboptimal expert selection and ineffective learning. To address this, we propose Autonomy-of-Experts (AoE), a novel MoE paradigm in which experts autonomously select themselves to process inputs. AoE is based on the insight that an expert is aware of its own capacity to effectively process a token, an awareness reflected in the scale of its internal activations. In AoE, routers are removed; instead, experts pre-compute internal activations for inputs and are ranked based on their activation norms. Only the top-ranking experts proceed with the forward pass, while the others abort.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Autonomy-of-Experts Models· slideslive

Taxonomy

TopicsEthics and Social Impacts of AI · Risk Perception and Management

MethodsMixture of Experts · Attentive Walk-Aggregating Graph Neural Network