SecMoE: Communication-Efficient Secure MoE Inference via Select-Then-Compute

Bowen Shen; Yuyue Chen; Peng Yang; Bin Zhang; Xi Zhang; Zoe L. Jiang

arXiv:2601.06790·cs.CR·January 13, 2026

SecMoE: Communication-Efficient Secure MoE Inference via Select-Then-Compute

Bowen Shen, Yuyue Chen, Peng Yang, Bin Zhang, Xi Zhang, Zoe L. Jiang

PDF

Open Access 1 Video

TL;DR

SecMoE introduces a communication-efficient, privacy-preserving MoE inference framework that significantly scales model size and reduces communication overhead while maintaining privacy in two-party settings.

Contribution

It proposes a novel Select-Then-Compute approach that enhances privacy and efficiency in secure MoE inference, enabling larger models with less communication and computation.

Findings

01

Scales to 63× larger models with only 15.2× runtime increase.

02

Reduces communication by 1.8× to 7.1× compared to SOTA.

03

Achieves 1.3× to 3.8× speedup over existing protocols.

Abstract

Privacy-preserving Transformer inference has gained attention due to the potential leakage of private information. Despite recent progress, existing frameworks still fall short of practical model scales, with gaps up to a hundredfold. A possible way to close this gap is the Mixture of Experts (MoE) architecture, which has emerged as a promising technique to scale up model capacity with minimal overhead. However, given that the current secure two-party (2-PC) protocols allow the server to homomorphically compute the FFN layer with its plaintext model weight, under the MoE setting, this could reveal which expert is activated to the server, exposing token-level privacy about the client's input. While naively evaluating all the experts before selection could protect privacy, it nullifies MoE sparsity and incurs the heavy computational overhead that sparse MoE seeks to avoid. To address the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

SecMoE: Communication-Efficient Secure MoE Inference via Select-Then-Compute· underline

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Cryptography and Data Security · Adversarial Robustness in Machine Learning