Subspace-Boosted Model Merging
Ronald Skorobogat, Karsten Roth, Mariana-Iuliana Georgescu

TL;DR
This paper introduces Subspace Boosting, a method that improves model merging of multiple experts by maintaining task vector ranks, significantly enhancing performance on vision and language benchmarks.
Contribution
It provides a theoretical and empirical analysis of merging limitations and proposes Subspace Boosting and Higher-Order GSV to improve merging efficacy and interpretability.
Findings
Subspace Boosting raises merging efficacy for up to 20 experts by over 10%.
Maintains task vector ranks to prevent rank collapse during merging.
Offers a new perspective on task similarity using Higher-Order GSV.
Abstract
Model merging enables the combination of multiple specialized expert models into a single model capable of performing multiple tasks. However, the benefits of merging an increasing amount of specialized experts generally lead to diminishing returns and reduced overall performance gains. In this work, we empirically and theoretically analyze this limitation, proving that for Task Arithmetic-based methods, as more experts are merged, the common information dominates the task-specific information, leading to inevitable rank collapse. To mitigate this issue, we introduce Subspace Boosting, which operates on the singular value decomposed task vector space and maintains task vector ranks. Subspace Boosting raises merging efficacy for up to 20 experts by large margins of more than 10% when evaluated on both vision and language benchmarks. Moreover, we propose employing Higher-Order Generalized…
Peer Reviews
Decision·Submitted to ICLR 2026
- The topic is relevant to the community, addressing a problem in multi-task and model-merging research. - The proposed Subspace Boosting method is conceptually clear and appears computationally efficient. - The connection between singular value structure and model merging dynamics is insightful. The analysis of shared versus task-specific subspaces through the singular-value structure is particularly interesting.
- Some inclarities about the tables and exepriment report (see questions). - The notion of “rank collapse” that is central in the paper could benefit from a more formal explanation. - Algorithm 1 applies a standard SVD step. I think it is misleading to put it in the approach instead of the algorithm of the Subspace Boosting. Minor - In Figure 2 (a–c), the y-axis label “Value” should likely be “Stable rank value”, right? - In line 268, “n” is associated with the shape of V and aslo to the num
1. This work identifies a critical limitation in existing model merging approaches, rank collapse, and provides empirical evidence to support this finding. 2. The paper is well-organized and easy to follow.
1. **Potential error amplification.** Directly boosting the singular values below the cutoff point may introduce noise or bias. The authors should provide further discussion to justify the rationality of the proposed subspace boosting technique. 2. **Hyperparameter sensitivity.** The method requires manual tuning of the cutoff hyperparameter, which may limit its practicality and robustness in real-world applications. 3. **Unclear connection between HO-GSVD and rank collapse.** While HO-GSVD of
- Unlike prior works that only observed diminishing performance with more merged experts, this study provides a mechanistic explanation from a task vector space perspective: as more experts are merged, task vectors suffer from rank collapse. - The proposed Subspace Boosting addresses rank collapse in a highly practical manner. Operating via singular value decomposition (SVD) on merged task vectors, it boosts underutilized small singular values to maintain effective rank.
- The third part quantifies Rank collapse only relying on "Stable Rank" and "Cumulative Energy Rank" (for example, Formula 2 defines stable rank as the ratio of the sum of squares of singular values to the square of the maximum singular value). However, the universality of these two indicators for the "correlation degree of model fusion performance" has not been fully demonstrated. The manuscript only demonstrates the negative correlation between the stable rank and performance through experimen
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis
