HELLoRA: Hot Experts Layer-Level Low-Rank Adaptation for Mixture-of-Experts Models

Jia Wei; Zhonghao Zhang; Ping Chen; Qianyang li; Yancheng Pan; Shaoxun Wang; Ziyi Qiu; Longxiang Wang

arXiv:2605.18795·cs.LG·May 20, 2026

HELLoRA: Hot Experts Layer-Level Low-Rank Adaptation for Mixture-of-Experts Models

Jia Wei, Zhonghao Zhang, Ping Chen, Qianyang li, Yancheng Pan, Shaoxun Wang, Ziyi Qiu, Longxiang Wang

PDF

TL;DR

HELLoRA introduces an activation-aware low-rank adaptation method for Mixture-of-Experts models, improving efficiency and performance by attaching adapters only to frequently activated experts.

Contribution

It proposes a simple yet effective mechanism for parameter-efficient fine-tuning of MoE models by selectively attaching LoRA modules based on expert activation frequency.

Findings

01

HELLoRA reduces trainable parameters by up to 84.3% compared to LoRA.

02

HELLoRA achieves up to 1.9x training throughput and improves accuracy.

03

HELLoRA outperforms strong PEFT baselines across multiple MoE backbones and tasks.

Abstract

Low-Rank Adaptation (LoRA) dominates parameter-efficient fine-tuning of large language models, yet most variants target dense architectures. Mixture-of-Experts (MoE) models scale parameters at near-constant per-token compute, and their sparse activation patterns create untapped opportunities for more efficient adaptation. We propose Hot-Experts Layer-level Low-Rank Adaptation (HELLoRA), which attaches LoRA modules only to the most frequently activated experts at each layer. This simple mechanism reduces trainable parameters and adapter-induced FLOPs while improving downstream performance, an effect we attribute to a form of structured regularization that preserves pretrained expert specialization. To stress-test HELLoRA under extreme parameter budgets, we further compose it with LoRI to form HELLoRI, which freezes the up-projection and sparsifies the down-projection. Across three MoE…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.