MLPMoE: Zero-Shot Architectural Metamorphosis of Dense LLM MLPs into Static Mixture-of-Experts

Ivan Novikov

arXiv:2511.21089·cs.LG·November 27, 2025

MLPMoE: Zero-Shot Architectural Metamorphosis of Dense LLM MLPs into Static Mixture-of-Experts

Ivan Novikov

PDF

Open Access

TL;DR

This paper presents MLPMoE, a training-free method to convert dense MLPs in transformer models into static mixture-of-experts structures, reducing parameters and computational costs with minimal impact on performance.

Contribution

MLPMoE introduces a novel, deterministic tensor slicing approach to transform dense MLPs into static MoE structures without training or calibration data.

Findings

01

Parameter reduction of about 20% with minimal perplexity increase

02

Transformation preserves model performance within 0.05% perplexity

03

Operates post hoc without gradient updates or router training

Abstract

Large Language Models (LLMs) are predominantly deployed as dense transformers, where every parameter in every feed-forward block is activated for every token. While architecturally simple, this is computationally inefficient, since inference costs scale linearly with parameter count. Recent upcycling methods such as MoEfication, CMoE, ToMoE, and MoORE reveal that much of the useful computation lives in sparse, semi-modular substructures inside dense feed-forward networks, but these approaches typically rely on clustering, activation profiling, singular value decomposition, or custom routing that requires calibration data. This paper introduces MLPMoE (MLP Mixture-of-Experts), a training-free, deterministic transformation that restructures the dense MLP in transformer blocks into a static, high-cardinality mixture of experts. The transformation uses simple tensor slicing and summation,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Machine Learning in Materials Science · Advanced Graph Neural Networks