SDG-MoE: Signed Debate Graph Mixture-of-Experts

Stepan Kulibaba; Kirill Labzin; Artem Dzhalilov; Roman Pakhomov; Oleg Svidchenko; Alexander Gasnikov; Aleksei Shpilman

arXiv:2605.08322·cs.LG·May 13, 2026

SDG-MoE: Signed Debate Graph Mixture-of-Experts

Stepan Kulibaba, Kirill Labzin, Artem Dzhalilov, Roman Pakhomov, Oleg Svidchenko, Alexander Gasnikov, Aleksei Shpilman

PDF

TL;DR

SDG-MoE introduces a novel architecture for sparse Mixture-of-Experts that incorporates structured, iterative expert communication to enhance performance and stability, demonstrated by significant perplexity improvements.

Contribution

It proposes a new signed debate graph approach with learned expert interactions and a deliberation process, advancing MoE models with structured expert communication.

Findings

01

SDG-MoE outperforms baseline models by 19.8% in validation perplexity.

02

Achieves best external perplexity on WikiText-103, C4, and Paloma datasets.

03

Theoretical analysis confirms stability and low overhead of the deliberation process.

Abstract

Sparse MoE models achieve a good balance between capacity and compute by routing each token to a small subset of experts. However, in most MoE architectures, once a token is routed, the selected experts process it independently and their outputs are combined via a weighted sum. This leaves open whether enabling communication among them could improve performance. While prior work has raised this question, direct interaction among the active routed experts remains underexplored. In this paper, we propose SDG-MoE (Signed Debate Graph Mixture-of-Experts), a novel architecture that adds a lightweight, iterative deliberation step before final aggregation. SDG-MoE introduces three components: (i) two learned interaction matrices over the active experts, a support graph $A^{+}$ and a critique graph $A^{-}$ , capturing reinforcing and corrective influences; (ii) a signed message-passing step that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.