ConceptMoE: Adaptive Token-to-Concept Compression for Implicit Compute Allocation

Zihao Huang; Jundong Zhou; Xingwei Qu; Qiyang Min; Ge Zhang

arXiv:2601.21420·cs.LG·January 30, 2026

ConceptMoE: Adaptive Token-to-Concept Compression for Implicit Compute Allocation

Zihao Huang, Jundong Zhou, Xingwei Qu, Qiyang Min, Ge Zhang

PDF

Open Access

TL;DR

ConceptMoE introduces a dynamic token merging approach that adaptively compresses sequences into concept representations, enabling more efficient and effective large language models with improved performance and speedups.

Contribution

It proposes a novel adaptive token-to-concept compression method for MoE models, improving efficiency and performance across language and vision tasks.

Findings

01

Outperforms standard MoE in multiple benchmarks

02

Reduces attention computation and KV cache usage significantly

03

Achieves notable speedups in prefill and decoding times

Abstract

Large language models allocate uniform computation across all tokens, ignoring that some sequences are trivially predictable while others require deep reasoning. We introduce ConceptMoE, which dynamically merges semantically similar tokens into concept representations, performing implicit token-level compute allocation. A learnable chunk module identifies optimal boundaries by measuring inter-token similarity, compressing sequences by a target ratio $R$ before they enter the compute-intensive concept model. Crucially, the MoE architecture enables controlled evaluation: we reallocate saved computation to match baseline activated FLOPs (excluding attention map computation) and total parameters, isolating genuine architectural benefits. Under these conditions, ConceptMoE consistently outperforms standard MoE across language and vision-language tasks, achieving +0.9 points on language…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques