Wolf2Pack: The AutoFusion Framework for Dynamic Parameter Fusion
Bowen Tian, Songning Lai, Yutao Yue

TL;DR
AutoFusion is a novel framework that dynamically fuses model parameters for multi-task learning, enabling adaptable and scalable model integration without pre-trained checkpoints or labeled data.
Contribution
It introduces an unsupervised, end-to-end parameter fusion method that permutes and optimizes model parameters across layers for multi-task learning.
Findings
Outperforms existing methods like Weight Interpolation and ZipIt
Demonstrates effectiveness on benchmark datasets
Offers scalable and flexible model fusion solution
Abstract
In the rapidly evolving field of deep learning, specialized models have driven significant advancements in tasks such as computer vision and natural language processing. However, this specialization leads to a fragmented ecosystem where models lack the adaptability for broader applications. To overcome this, we introduce AutoFusion, an innovative framework that fuses distinct model parameters(with the same architecture) for multi-task learning without pre-trained checkpoints. Using an unsupervised, end-to-end approach, AutoFusion dynamically permutes model parameters at each layer, optimizing the combination through a loss-minimization process that does not require labeled data. We validate AutoFusion's effectiveness through experiments on commonly used benchmark datasets, demonstrating superior performance over established methods like Weight Interpolation, Git Re-Basin, and ZipIt. Our…
Peer Reviews
Decision·Submitted to ICLR 2025
Although I am not an expert in this domain, I believe these strengths should be acknowledged: + The paper presents a clear and convincing motivation, effectively setting the stage for the proposed work. + There is a notable degree of innovation in the methodology, and the authors have thoroughly reviewed prior approaches, clarifying how their contributions advance the state-of-the-art. + The results achieved by the proposed method are impressive, consistently outperforming baselines across a
- Line 225: The sentence appears to be incomplete because it begins with a conditional clause (“If we attempt to…”), which typically requires a main clause to complete the thought. In English, when a sentence starts with “If,” it sets up an expectation that there will be a following statement explaining the result, purpose, or consequence of the condition. - To further demonstrate the effectiveness of the proposed fusion method, more complex tasks and datasets should be considered, such as dete
- It leverages Mena et al. (2018) to make permutation matrix in Re-basin differentiable, thus allowing end-to-end training. - It achieves clear improvement over baseline methods on MINST and CIFAR.
__Experiments could be improved__ - an analysis of model similarity is needed. - baselines of fine-tuning the model (trained on one task) on the multi-task jointly are needed. They will provide a good reference even though they are not consider as fair comparisons. - LoRA fine-tuning could be considered as a fair baseline. As the proposed model learns a permutation matrix per layer, which essentially can be considered as low-rank fine-tuning. Thus, adding comparison to LoRA fine-tuning would pr
1. The visualization of the method is good. 2. The application of the Sinkhorn operator is innovative in the field of deep model fusion.
1. It would be beneficial to list the number of optimized parameters of each methods. 2. Lacking related work or experimental results to substantiate the claim in lines 215-218 that "However, this assumption of high similarity falls apart when the models to be merged are trained for different tasks. During merging, we must not only align parameters with similar functions but also strive to retain parameters with distinct functions, enabling the fused model to perform various tasks simultaneously
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSimulation Techniques and Applications · Time Series Analysis and Forecasting · Image Processing and 3D Reconstruction
