DECO: Sparse Mixture-of-Experts with Dense-Comparable Performance on End-Side Devices

Chenyang Song; Weilin Zhao; Xu Han; Chaojun Xiao; Yingfa Chen; Zhiyuan Liu

arXiv:2605.10933·cs.LG·May 21, 2026

DECO: Sparse Mixture-of-Experts with Dense-Comparable Performance on End-Side Devices

Chenyang Song, Weilin Zhao, Xu Han, Chaojun Xiao, Yingfa Chen, Zhiyuan Liu

PDF

1 Repo 4 Models

TL;DR

DECO introduces a sparse Mixture-of-Experts architecture that achieves dense-transformer performance on end-side devices with significantly reduced computational and storage requirements.

Contribution

The paper proposes DECO, a novel sparse MoE architecture with adaptive routing, a new activation function, and simplified expert design, matching dense model performance efficiently.

Findings

01

DECO activates only 20% of experts while maintaining dense performance.

02

DECO outperforms existing MoE baselines in experiments.

03

Achieves a 2.93× speedup on Jetson AGX Orin with specialized kernel.

Abstract

While Mixture-of-Experts (MoE) scales model capacity without proportionally increasing computation, its massive total parameter footprint creates significant storage and memory-access bottlenecks, which hinder efficient end-side deployment that simultaneously requires high performance, low computational cost, and small storage overhead. To achieve these properties, we present DECO, a sparse MoE architecture designed to match the performance of dense Transformers under identical total parameter budgets and training tokens. DECO utilizes the differentiable and flexible ReLU-based routing enhanced by learnable expert-wise scaling, which adaptively balances the contributions of routed and shared experts. Furthermore, we introduce NormSiLU, an activation function that normalizes inputs prior to SiLU operators, producing a more stable trend of routed-expert activation ratio and a higher…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

thunlp/DECO
github

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.