Mixture of LoRA Experts
Xun Wu, Shaohan Huang, Furu Wei

TL;DR
This paper introduces MoLE, a hierarchical fusion method for LoRA experts that improves upon existing techniques by enhancing performance and flexibility in combining multiple LoRAs across NLP and V&L tasks.
Contribution
The paper proposes the MoLE approach, enabling flexible and effective fusion of multiple LoRAs using hierarchical control and branch selection, outperforming existing methods.
Findings
MoLE achieves superior fusion performance compared to arithmetic merging.
MoLE maintains the generative capabilities of the original models.
Experimental results validate MoLE's effectiveness in NLP and V&L tasks.
Abstract
LoRA has gained widespread acceptance in the fine-tuning of large pre-trained models to cater to a diverse array of downstream tasks, showcasing notable effectiveness and efficiency, thereby solidifying its position as one of the most prevalent fine-tuning techniques. Due to the modular nature of LoRA's plug-and-play plugins, researchers have delved into the amalgamation of multiple LoRAs to empower models to excel across various downstream tasks. Nonetheless, extant approaches for LoRA fusion grapple with inherent challenges. Direct arithmetic merging may result in the loss of the original pre-trained model's generative capabilities or the distinct identity of LoRAs, thereby yielding suboptimal outcomes. On the other hand, Reference tuning-based fusion exhibits limitations concerning the requisite flexibility for the effective combination of multiple LoRAs. In response to these…
Peer Reviews
Decision·ICLR 2024 poster
1. The idea of combining different LoRA via gating mechanism is intuitive and novel. 2. Authors perform extensive set of experiments and show the effectiveness of the technique both in Vision and NLP domain. 3. Authors perform a detailed ablation study to assess various losses and different components.
1. Authors motivate (section 3.1) the need for Mixture of LoRA for the vision domain but it is not clear if it is also required for the NLP domain as well or not (as also indicated by marginal improvement in results). 2. For the NLP domain the evaluation is done only for one classification task (NLI) and no generative task (e.g., summarization or translation) is evaluated. Analogous to vision domain, it would be great to see effect would MOLE bring in during generation.
+ The motivation of combining MOE with LoRA is sound. Different from the original LoRA, the LoRA weights from both the attention and mlp layers are regarded as one individual LoRA expert. + A penalty loss is proposed to tackle the gating imbalance issue, so that more LoRA experts are well-trained. + For text-to-image generation task in the V&L domain, the proposed MoLE achieves better average scores. In figure 9, the generated image follows text instructions better.
+ Overall, for NLI tasks in NLP domain, the proposed MoLE shows similar average performance compared with LoRAhub. + Combining MOE with LoRA seems straightforward. + No other LoRA merging variants are compared in experiments.
- Learning a gating function to combine LoRA modules is a sensible idea and is generally motivated well in the paper. - The proposed approach does not add too many additional parameters.
Many details in the paper are unclear - The related work in Section 2.2 can be more clearly explained. Particularly, it is claimed that the "arithmetic operation-based fusion" suffers from "identity confusion among multiple LoRAs". This issue needs to be clarified. How does the proposed approach fix this issue? Also, the details of the "reference tuning-based fusion" method are unclear. Is the approach from Gu et al., 2023 comparable to this work? If so, why is this approach not compared against
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Context-Aware Activity Recognition Systems · Machine Learning and Data Classification
