LLaVA-MoLE: Sparse Mixture of LoRA Experts for Mitigating Data Conflicts   in Instruction Finetuning MLLMs

Shaoxiang Chen; Zequn Jie; Lin Ma

arXiv:2401.16160·cs.CV·January 31, 2024·6 cites

LLaVA-MoLE: Sparse Mixture of LoRA Experts for Mitigating Data Conflicts in Instruction Finetuning MLLMs

Shaoxiang Chen, Zequn Jie, Lin Ma

PDF

Open Access

TL;DR

This paper introduces LLaVA-MoLE, a sparse Mixture of LoRA Experts approach for instruction finetuning multimodal models, effectively mitigating data conflicts and improving performance across diverse datasets.

Contribution

It proposes a novel sparse MoE design with LoRA experts for adaptive token routing, reducing data conflict issues in multimodal instruction finetuning.

Findings

01

LLaVA-MoLE outperforms plain-LoRA baselines on mixed datasets.

02

LLaVA-MoLE achieves performance comparable to or better than models trained on twice the data.

03

The sparse MoE approach maintains similar training and inference costs as LoRA.

Abstract

Instruction finetuning on a variety of image-text instruction data is the key to obtaining a versatile Multimodal Large Language Model (MLLM), and different configurations of the instruction data can lead to finetuned models with different capabilities. However, we have discovered that data conflicts are inevitable when mixing instruction data from distinct domains, which can result in performance drops for tasks of a specific domain. To address this issue, we propose to apply an efficient Mixture of Experts (MoE) design, which is a sparse Mixture of LoRA Experts (MoLE) for instruction finetuning MLLMs. Within the Transformer layers, we extend the popular Low-Rank Adaption (LoRA) method by creating a set of LoRA experts specifically for the MLP layer, and route each token to the top-1 expert based on a routing function, allowing adaptive choices for tokens from different domains. Since…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Anomaly Detection Techniques and Applications

MethodsAttention Is All You Need · Sparse Evolutionary Training · Linear Layer · Dropout · Layer Normalization · Multi-Head Attention · Byte Pair Encoding · Residual Connection · Adam · Softmax