Expert Selections In MoE Models Reveal (Almost) As Much As Text

Amir Nuriyev; Gabriel Kulp

arXiv:2602.04105·cs.CL·March 16, 2026

Expert Selections In MoE Models Reveal (Almost) As Much As Text

Amir Nuriyev, Gabriel Kulp

PDF

Open Access

TL;DR

This paper demonstrates that routing decisions in mixture-of-experts language models can be used to accurately reconstruct original text, revealing significant information leakage and raising privacy concerns.

Contribution

It introduces a novel text-reconstruction attack on MoE models that surpasses prior methods, showing expert routing leaks substantial information about the input text.

Findings

01

A 3-layer MLP achieves 63.1% top-1 accuracy in token reconstruction.

02

A transformer-based decoder recovers 91.2% of tokens top-1 from expert selections.

03

Adding noise reduces but does not eliminate the leakage of information.

Abstract

We present a text-reconstruction attack on mixture-of-experts (MoE) language models that recovers tokens from expert selections alone. In MoE models, each token is routed to a subset of expert subnetworks; we show these routing decisions leak substantially more information than previously understood. Prior work using logistic regression achieves limited reconstruction; we show that a 3-layer MLP improves this to 63.1% top-1 accuracy, and that a transformer-based sequence decoder recovers 91.2% of tokens top-1 (94.8% top-10) on 32-token sequences from OpenWebText after training on 100M tokens. These results connect MoE routing to the broader literature on embedding inversion. We outline practical leakage scenarios (e.g., distributed inference and side channels) and show that adding noise reduces but does not eliminate reconstruction. Our findings suggest that expert selections in MoE…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Natural Language Processing Techniques