Multi LoRA Meets Vision: Merging multiple adapters to create a multi task model
Ege Kesim, Selahattin Serdar Helli

TL;DR
This paper explores merging multiple LoRA adapters trained on different computer vision tasks to create efficient multi-task models without retraining, showing promising results and potential advantages over traditional finetuning.
Contribution
It demonstrates that merging multiple LoRA adapters is feasible for multi-task vision models, reducing inference time and maintaining performance without additional training.
Findings
Merging up to three adapters is effective for multi-task models.
Adapters trained on dissimilar datasets often outperform those trained on similar data.
Merged adapters can sometimes outperform head finetuning in accuracy.
Abstract
Parameter efficient finetuning (PEFT) methods are widely used in LLMs and generative models in computer vision. Especially one can use multiple of these during inference to change the behavior of the base model. In this paper we investigated whether multiple LoRA adapters trained on computer vision tasks can be merged together and used during inference without loss in performance. By achieving this, multitask models can be created just by merging different LoRAs. Merging these will reduce inference time and it will not require any additional retraining. We have trained adapters on six different tasks and evaluated their performance when they are merged together. For comparison we used a model with a frozen backbone and finetuned its head. Our results show that even with simple merging techniques creating a multitask model by merging adapters is achievable by slightly loosing performance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Sensor-Based Localization · Advanced Image and Video Retrieval Techniques · 3D Surveying and Cultural Heritage
MethodsBalanced Selection
