MoE-Beyond: Learning-Based Expert Activation Prediction on Edge Devices

Nishant Gavhane; Arush Mehrotra; Rohit Chawla; Peter Proenca

arXiv:2508.17137·cs.LG·August 26, 2025

MoE-Beyond: Learning-Based Expert Activation Prediction on Edge Devices

Nishant Gavhane, Arush Mehrotra, Rohit Chawla, Peter Proenca

PDF

TL;DR

MoE-Beyond introduces a learning-based expert activation predictor for edge devices, significantly improving cache efficiency and enabling large-scale MoE models to operate effectively within memory constraints.

Contribution

This work presents a novel transformer-based predictor trained on expert activation traces, outperforming heuristics in cache hit rate and generalizing across unseen prompts.

Findings

01

Achieves 97.5% accuracy in expert activation prediction.

02

Improves GPU cache hit rate from 17% to 72%.

03

Outperforms heuristic caching strategies.

Abstract

The deployment of large-scale Mixture-of-Experts (MoE) models on edge devices presents significant challenges due to memory constraints. While MoE architectures enable efficient utilization of computational resources by activating only a subset of experts per inference, they require careful memory management to operate efficiently in resource-constrained environments. Traditional heuristic-based expert caching strategies such as MoE-Infinity struggle to maintain high cache hit rates as models parameters scale. In this work, we introduce MoE-Beyond, a learning-based expert activation predictor trained to predict expert activations during autoregressive decoding. By framing the task as a multi-label sequence prediction problem, we train a lightweight transformer model on 66 million expert activation traces extracted from LDJnr-Puffin dataset [5] using DeepSeek-V2-Chat-Lite MoE. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.