CryptoMoE: Privacy-Preserving and Scalable Mixture of Experts Inference via Balanced Expert Routing

Yifan Zhou; Tianshi Xu; Jue Hong; Ye Wu; Meng Li

arXiv:2511.01197·cs.CR·November 12, 2025

CryptoMoE: Privacy-Preserving and Scalable Mixture of Experts Inference via Balanced Expert Routing

Yifan Zhou, Tianshi Xu, Jue Hong, Ye Wu, Meng Li

PDF

Open Access 1 Video

TL;DR

CryptoMoE introduces a privacy-preserving, scalable inference framework for mixture-of-experts models that balances expert routing and employs novel protocols, significantly reducing latency and communication while maintaining accuracy.

Contribution

It is the first framework enabling private, efficient, and accurate MoE inference with balanced expert routing and novel secure protocols.

Findings

01

Achieves 2.8-3.5x latency reduction over dense baselines.

02

Reduces communication by 2.9-4.3x with minimal accuracy loss.

03

Demonstrates effectiveness on large-scale MoE models.

Abstract

Private large language model (LLM) inference based on cryptographic primitives offers a promising path towards privacy-preserving deep learning. However, existing frameworks only support dense LLMs like LLaMA-1 and struggle to scale to mixture-of-experts (MoE) architectures. The key challenge comes from securely evaluating the dynamic routing mechanism in MoE layers, which may reveal sensitive input information if not fully protected. In this paper, we propose CryptoMoE, the first framework that enables private, efficient, and accurate inference for MoE-based models. CryptoMoE balances expert loads to protect expert routing information and proposes novel protocols for secure expert dispatch and combine. CryptoMoE also develops a confidence-aware token selection strategy and a batch matrix multiplication protocol to improve accuracy and efficiency further. Extensive experiments on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

CryptoMoE: Privacy-Preserving and Scalable Mixture of Experts Inference via Balanced Expert Routing· slideslive

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Cryptography and Data Security · Stochastic Gradient Optimization Techniques