SDG-MoE: Signed Debate Graph Mixture-of-Experts
Stepan Kulibaba, Kirill Labzin, Artem Dzhalilov, Roman Pakhomov, Oleg Svidchenko, Alexander Gasnikov, Aleksei Shpilman

TL;DR
SDG-MoE introduces a novel architecture for sparse Mixture-of-Experts that incorporates structured, iterative expert communication to enhance performance and stability, demonstrated by significant perplexity improvements.
Contribution
It proposes a new signed debate graph approach with learned expert interactions and a deliberation process, advancing MoE models with structured expert communication.
Findings
SDG-MoE outperforms baseline models by 19.8% in validation perplexity.
Achieves best external perplexity on WikiText-103, C4, and Paloma datasets.
Theoretical analysis confirms stability and low overhead of the deliberation process.
Abstract
Sparse MoE models achieve a good balance between capacity and compute by routing each token to a small subset of experts. However, in most MoE architectures, once a token is routed, the selected experts process it independently and their outputs are combined via a weighted sum. This leaves open whether enabling communication among them could improve performance. While prior work has raised this question, direct interaction among the active routed experts remains underexplored. In this paper, we propose SDG-MoE (Signed Debate Graph Mixture-of-Experts), a novel architecture that adds a lightweight, iterative deliberation step before final aggregation. SDG-MoE introduces three components: (i) two learned interaction matrices over the active experts, a support graph and a critique graph , capturing reinforcing and corrective influences; (ii) a signed message-passing step that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
