TL;DR
MoORE introduces a novel SVD-based model MoE-ization technique that creates a conflict- and oblivion-resistant multi-task adaptation method by transforming weight matrices into a Mixture of Orthogonal Rank-one Experts, improving multi-task learning robustness.
Contribution
The paper proposes MoORE, a new SVD-based model MoE-ization method that guarantees orthogonality and maintains original weight space, enhancing multi-task adaptation resistance to conflicts and oblivion.
Findings
MoORE outperforms existing methods in multi-task adaptation.
It effectively resists task conflicts and oblivion.
Experiments demonstrate consistent superiority across datasets.
Abstract
Adapting large-scale foundation models in multi-task scenarios often suffers from task conflict and oblivion. To mitigate such issues, we propose a novel ''model MoE-ization'' strategy that leads to a conflict- and oblivion-resistant multi-task adaptation method. Given a weight matrix of a pre-trained model, our method applies SVD to it and introduces a learnable router to adjust its singular values based on tasks and samples. Accordingly, the weight matrix becomes a Mixture of Orthogonal Rank-one Experts (MoORE), in which each expert corresponds to the outer product of a left singular vector and the corresponding right one. We can improve the model capacity by imposing a learnable orthogonal transform on the right singular vectors. Unlike low-rank adaptation (LoRA) and its MoE-driven variants, MoORE guarantees the experts' orthogonality and maintains the column space of the original…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
