Concrete Subspace Learning based Interference Elimination for Multi-task Model Fusion
Anke Tang, Li Shen, Yong Luo, Liang Ding, Han Hu, Bo Du, Dacheng Tao

TL;DR
This paper introduces a novel Concrete subspace learning method for multi-task model fusion, effectively eliminating interference among task-specific models by identifying shared low-dimensional subspaces through a meta-learning approach.
Contribution
It proposes a bi-level optimization framework using gradient-based meta-learning to find a shared subspace mask, improving multi-task model merging without significant performance loss.
Findings
Effective interference elimination demonstrated on vision and language tasks
Outperforms existing model merging techniques in experiments
Code availability facilitates reproducibility and further research
Abstract
Merging models fine-tuned from a common, extensively pre-trained large model but specialized for different tasks has been demonstrated as a cheap and scalable strategy to construct a multi-task model that performs well across diverse tasks. Recent research, exemplified by task arithmetic, highlights that this multi-task model can be derived through arithmetic operations on task vectors. Nevertheless, current merging techniques frequently resolve potential conflicts among parameters from task-specific models by evaluating individual attributes, such as the parameters' magnitude or sign, overlooking their collective impact on the overall functionality of the model. In this work, we propose the CONtinuous relaxation of disCRETE (Concrete) subspace learning method to identify a common low-dimensional subspace and utilize its shared information to track the interference problem without…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Machine Learning and ELM
MethodsFocus
