Exploring Training on Heterogeneous Data with Mixture of Low-rank Adapters
Yuhang Zhou, Zihua Zhao, Haolin Li, Siyuan Du, Jiangchao Yao, Ya, Zhang, Yanfeng Wang

TL;DR
This paper introduces Mixture of Low-rank Adapters (MoLA), a novel method to effectively train unified models on heterogeneous data by mitigating conflicts through task-specific adapters and a task-wise decorrelation loss.
Contribution
It proposes two variants, MoLA-Grad and MoLA-Router, to handle target-aware and target-agnostic scenarios, advancing multi-task learning with low-rank adapters.
Findings
MoLA outperforms previous state-of-the-art methods.
MoLA effectively mitigates training conflicts among heterogeneous data.
In-depth analysis reveals the working mechanism of MoLA.
Abstract
Training a unified model to take multiple targets into account is a trend towards artificial general intelligence. However, how to efficiently mitigate the training conflicts among heterogeneous data collected from different domains or tasks remains under-explored. In this study, we explore to leverage Mixture of Low-rank Adapters (MoLA) to mitigate conflicts in heterogeneous data training, which requires to jointly train the multiple low-rank adapters and their shared backbone. Specifically, we introduce two variants of MoLA, namely, MoLA-Grad and MoLA-Router, to respectively handle the target-aware and target-agnostic scenarios during inference. The former uses task identifiers to assign personalized low-rank adapters to each task, disentangling task-specific knowledge towards their adapters, thereby mitigating heterogeneity conflicts. The latter uses a novel Task-wise Decorrelation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Gaussian Processes and Bayesian Inference
