Analytical FFN-to-MoE Restructuring via Activation Pattern Analysis

Zehua Pei; Hui-Ling Zhen; Lancheng Zou; Xianzhi Yu; Wulong Liu; Sinno Jialin Pan; Mingxuan Yuan; Bei Yu

arXiv:2502.04416·cs.LG·April 24, 2026

Analytical FFN-to-MoE Restructuring via Activation Pattern Analysis

Zehua Pei, Hui-Ling Zhen, Lancheng Zou, Xianzhi Yu, Wulong Liu, Sinno Jialin Pan, Mingxuan Yuan, Bei Yu

PDF

1 Repo

TL;DR

This paper introduces an analytical post-training method to efficiently convert dense feed-forward networks into sparse Mixture-of-Experts architectures using minimal data, significantly reducing inference costs.

Contribution

It presents a novel activation pattern analysis framework that enables rapid FFN-to-MoE restructuring without extensive retraining or large datasets.

Findings

01

Achieves up to 1.17x speedup in compute-bound scenarios.

02

Requires only minutes of processing and 2000 samples for fine-tuning.

03

Outperforms existing methods that need much more resources.

Abstract

Scaling large language models (LLMs) improves performance but significantly increases inference costs, with feed-forward networks (FFNs) consuming the majority of computational resources. While Mixture-of-Experts (MoE) architectures can reduce this cost through sparse activation, restructuring existing dense models into MoEs typically requires extensive retraining on hundreds of billions of tokens. We propose an analytical post-training framework that rapidly restructures FFNs into sparse MoE architectures using only a small calibration dataset. The method analyzes neuron activation patterns to partition neurons into always-active shared experts and conditionally activated routed experts, then constructs a router analytically from representative neuron statistics, enabling immediate deployment or optional lightweight fine-tuning. This approach applies both to dense models and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jarvispei/CMoE
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.