GRAPHMOE: Amplifying Cognitive Depth of Mixture-of-Experts Network via Introducing Self-Rethinking Mechanism

Bo Lv; Chen Tang; Zifan Zheng; Bohao Yang; Kun Zhao; Ning Liao; Xiaoxing Wang; Feiyu Xiong; Zhiyu Li; Nayu Liu; Jingchi Jiang

arXiv:2501.07890·cs.CL·January 6, 2026

GRAPHMOE: Amplifying Cognitive Depth of Mixture-of-Experts Network via Introducing Self-Rethinking Mechanism

Bo Lv, Chen Tang, Zifan Zheng, Bohao Yang, Kun Zhao, Ning Liao, Xiaoxing Wang, Feiyu Xiong, Zhiyu Li, Nayu Liu, Jingchi Jiang

PDF

Open Access

TL;DR

GRAPHMOE introduces a self-rethinking mechanism with recurrent routing to interconnect experts in MoE networks, significantly improving language model reasoning and achieving state-of-the-art results.

Contribution

The paper proposes GRAPHMOE, a novel MoE architecture with a self-rethinking mechanism and recurrent routing, enhancing cognitive depth and reasoning in language models.

Findings

01

Outperforms existing LoRA-based models on benchmark datasets

02

Achieves state-of-the-art performance in language modeling tasks

03

Introduces a recurrent routing strategy that enhances expert interaction

Abstract

Traditional Mixture-of-Experts (MoE) networks benefit from utilizing multiple smaller expert models as opposed to a single large network. However, these experts typically operate independently, leaving a question open about whether interconnecting these models could enhance the performance of MoE networks. In response, we introduce GRAPHMOE, a novel method aimed at augmenting the cognitive depth of language models via a self-rethinking mechanism constructed on Pseudo GraphMoE networks. GRAPHMOE employs a recurrent routing strategy to simulate iterative thinking steps, thereby facilitating the flow of information among expert nodes. We implement the GRAPHMOE architecture using Low-Rank Adaptation techniques (LoRA) and conduct extensive experiments on various benchmark datasets. The experimental results reveal that GRAPHMOE outperforms other LoRA based models, achieving state-of-the-art…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCognitive Science and Education Research

MethodsMixture of Experts