LiME: Lightweight Mixture of Experts for Efficient Multimodal Multi-task Learning

Md Kowsher; Haris Mansoor; Nusrat Jahan Prottasha; Ozlem Garibay; Victor Zhu; Zhengping Ji; Chen Chen

arXiv:2604.02338·cs.LG·April 6, 2026

LiME: Lightweight Mixture of Experts for Efficient Multimodal Multi-task Learning

Md Kowsher, Haris Mansoor, Nusrat Jahan Prottasha, Ozlem Garibay, Victor Zhu, Zhengping Ji, Chen Chen

PDF

3 Datasets

TL;DR

LiME introduces a lightweight, parameter-efficient mixture of experts approach for multimodal multi-task learning, reducing parameters and training time while maintaining or improving performance.

Contribution

LiME proposes a novel expert modulation method that eliminates the need for separate adapters and learned routing, enabling efficient multi-task learning across modalities.

Findings

01

LiME achieves comparable or better performance than MoE-PEFT baselines.

02

LiME reduces trainable parameters by up to 4x.

03

LiME accelerates training by up to 29%.

Abstract

MoE-PEFT methods combine Mixture of Experts with parameter-efficient fine-tuning for multi-task adaptation, but require separate adapters per expert causing trainable parameters to scale linearly with expert count and limiting applicability to adapter-based architectures. We propose LiME (Lightweight Mixture of Experts), which achieves expert specialization through lightweight modulation rather than adapter replication. Instead of separate adapters, LiME uses a single shared PEFT module and modulates its output with lightweight expert vectors, reducing expert parameters while generalizing to any PEFT method. Notably, LiME introduces zero-parameter routing by leveraging existing frozen and adapted representations eliminating learned router parameters typically required per layer. Theoretically, we prove that (i) more experts preserve more task-relevant information and (ii) modulation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.