HELLoRA: Hot Experts Layer-Level Low-Rank Adaptation for Mixture-of-Experts Models
Jia Wei, Zhonghao Zhang, Ping Chen, Qianyang li, Yancheng Pan, Shaoxun Wang, Ziyi Qiu, Longxiang Wang

TL;DR
HELLoRA introduces an activation-aware low-rank adaptation method for Mixture-of-Experts models, improving efficiency and performance by attaching adapters only to frequently activated experts.
Contribution
It proposes a simple yet effective mechanism for parameter-efficient fine-tuning of MoE models by selectively attaching LoRA modules based on expert activation frequency.
Findings
HELLoRA reduces trainable parameters by up to 84.3% compared to LoRA.
HELLoRA achieves up to 1.9x training throughput and improves accuracy.
HELLoRA outperforms strong PEFT baselines across multiple MoE backbones and tasks.
Abstract
Low-Rank Adaptation (LoRA) dominates parameter-efficient fine-tuning of large language models, yet most variants target dense architectures. Mixture-of-Experts (MoE) models scale parameters at near-constant per-token compute, and their sparse activation patterns create untapped opportunities for more efficient adaptation. We propose Hot-Experts Layer-level Low-Rank Adaptation (HELLoRA), which attaches LoRA modules only to the most frequently activated experts at each layer. This simple mechanism reduces trainable parameters and adapter-induced FLOPs while improving downstream performance, an effect we attribute to a form of structured regularization that preserves pretrained expert specialization. To stress-test HELLoRA under extreme parameter budgets, we further compose it with LoRI to form HELLoRI, which freezes the up-projection and sparsifies the down-projection. Across three MoE…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
