REAM: Merging Improves Pruning of Experts in LLMs

Saurav Jha; Maryam Hashemzadeh; Ali Saheb Pasand; Ali Parviz; Min-Joong Lee; Boris Knyazev

arXiv:2604.04356·cs.AI·April 7, 2026

REAM: Merging Improves Pruning of Experts in LLMs

Saurav Jha, Maryam Hashemzadeh, Ali Saheb Pasand, Ali Parviz, Min-Joong Lee, Boris Knyazev

PDF

1 Repo 8 Models

TL;DR

REAM is a novel method that merges experts in Mixture-of-Experts LLMs to reduce memory usage while maintaining performance, outperforming pruning-based methods on various benchmarks.

Contribution

Introduces REAM, a merging technique for experts in MoE LLMs that better preserves performance compared to pruning methods.

Findings

01

REAM often outperforms pruning baselines.

02

REAM maintains performance close to original models.

03

Trade-offs exist between different task performances based on data mix.

Abstract

Mixture-of-Experts (MoE) large language models (LLMs) are among the top-performing architectures. The largest models, often with hundreds of billions of parameters, pose significant memory challenges for deployment. Traditional approaches to reduce memory requirements include weight pruning and quantization. Motivated by the Router-weighted Expert Activation Pruning (REAP) that prunes experts, we propose a novel method, Router-weighted Expert Activation Merging (REAM). Instead of removing experts, REAM groups them and merges their weights, better preserving original performance. We evaluate REAM against REAP and other baselines across multiple MoE LLMs on diverse multiple-choice (MC) question answering and generative (GEN) benchmarks. Our results reveal a trade-off between MC and GEN performance that depends on the mix of calibration data. By controlling the mix of general, math and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

samsungsailmontreal/ream
github

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.