Evaluating Expert Contributions in a MoE LLM for Quiz-Based Tasks

Andrei Chernov

arXiv:2502.17187·cs.CL·February 25, 2025

Evaluating Expert Contributions in a MoE LLM for Quiz-Based Tasks

Andrei Chernov

PDF

Open Access

TL;DR

This paper investigates the behavior of experts in MoE-based LLMs on quiz tasks, revealing many experts are inactive, gating is nearly uniform, and expert performance varies widely.

Contribution

It provides the first detailed post-evaluation analysis of expert contributions in MoE LLMs on a benchmark.

Findings

01

Most experts were never activated during inference.

02

Gating network outputs are close to uniform, indicating lack of sparsity.

03

Expert performance varies significantly within the same layer.

Abstract

Recently, Large Language Models (LLMs) with Mixture of Experts (MoE) layers have gained significant attention. Currently, state-of-the-art LLMs utilize this architecture. There is a substantial amount of research on how to train such models and how to select hyperparameters for this architecture. However, there is a lack of studies focusing on post-evaluation analysis of MoE layer properties. In this paper, we take a first step toward closing this gap by evaluating expert contributions on the quiz-based MMLU benchmark. We show that most experts were never activated during inference on this benchmark. Additionally, the output distribution of gating networks is much closer to uniform than sparse. Finally, we demonstrate that the average performance of some experts within the same layer varies significantly.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBig Data and Business Intelligence · Business Process Modeling and Analysis · Software Engineering Techniques and Practices

MethodsMixture of Experts