TL;DR
REAM is a novel method that merges experts in Mixture-of-Experts LLMs to reduce memory usage while maintaining performance, outperforming pruning-based methods on various benchmarks.
Contribution
Introduces REAM, a merging technique for experts in MoE LLMs that better preserves performance compared to pruning methods.
Findings
REAM often outperforms pruning baselines.
REAM maintains performance close to original models.
Trade-offs exist between different task performances based on data mix.
Abstract
Mixture-of-Experts (MoE) large language models (LLMs) are among the top-performing architectures. The largest models, often with hundreds of billions of parameters, pose significant memory challenges for deployment. Traditional approaches to reduce memory requirements include weight pruning and quantization. Motivated by the Router-weighted Expert Activation Pruning (REAP) that prunes experts, we propose a novel method, Router-weighted Expert Activation Merging (REAM). Instead of removing experts, REAM groups them and merges their weights, better preserving original performance. We evaluate REAM against REAP and other baselines across multiple MoE LLMs on diverse multiple-choice (MC) question answering and generative (GEN) benchmarks. Our results reveal a trade-off between MC and GEN performance that depends on the mix of calibration data. By controlling the mix of general, math and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗SamsungSAILMontreal/Qwen3-30B-A3B-Instruct-2507-REAMmodel· 123 dl· ♡ 7123 dl♡ 7
- 🤗bknyaz/Qwen3-Next-80B-A3B-Instruct-REAMmodel· 10 dl· ♡ 510 dl♡ 5
- 🤗bknyaz/Qwen3-Coder-Next-REAMmodel· 48 dl· ♡ 2848 dl♡ 28
- 🤗bknyaz/GLM-4.5-Air-REAMmodel· 10 dl10 dl
- 🤗SamsungSAILMontreal/GLM-4.5-Air-REAPmodel· 6 dl6 dl
- 🤗bknyaz/Qwen3.5-122B-A10B-REAMmodel· 9 dl· ♡ 19 dl♡ 1
- 🤗keithnull/Qwen3.6-35B-A3B-REAM-192model· 196 dl· ♡ 4196 dl♡ 4
- 🤗gratex/Kimi-K2.6-REAM-25model· 74 dl74 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
