Model merging with SVD to tie the Knots

George Stoica; Pratik Ramesh; Boglarka Ecsedi; Leshem Choshen; Judy; Hoffman

arXiv:2410.19735·cs.CV·October 28, 2024

Model merging with SVD to tie the Knots

George Stoica, Pratik Ramesh, Boglarka Ecsedi, Leshem Choshen, Judy, Hoffman

PDF

Open Access 1 Repo 3 Reviews

TL;DR

This paper introduces KnOTS, a method using SVD to align LoRA finetuned models for improved merging, addressing the lower alignment issue and enhancing performance across vision and language tasks.

Contribution

We propose KnOTS, a novel SVD-based approach to improve the merging of LoRA finetuned models by enhancing their alignment, and introduce a new benchmark for evaluating merged model generality.

Findings

01

KnOTS improves LoRA merging performance by up to 4.3%.

02

The method enhances model alignment, leading to better task generalization.

03

A new benchmark evaluates the generality of merged models.

Abstract

Recent model merging methods demonstrate that the parameters of fully-finetuned models specializing in distinct tasks can be combined into one model capable of solving all tasks without retraining. Yet, this success does not transfer well when merging LoRA finetuned models. We study this phenomenon and observe that the weights of LoRA finetuned models showcase a lower degree of alignment compared to their fully-finetuned counterparts. We hypothesize that improving this alignment is key to obtaining better LoRA model merges, and propose KnOTS to address this problem. KnOTS uses the SVD to jointly transform the weights of different LoRA models into an aligned space, where existing merging methods can be applied. In addition, we introduce a new benchmark that explicitly evaluates whether merged models are general models. Notably, KnOTS consistently improves LoRA merging by up to 4.3%…

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 8Confidence 3

Strengths

1. The concept of using SVD to align model weights for improved merging is novel and practical, addressing a previously unexplored limitation in merging LoRA models. 2. The paper is generally well-structured, with clear descriptions of both the KnOTS methodology and experimental setups.

Weaknesses

1. It would be beneficial if the authors could validate the effectiveness of the method on larger LLMs, such as LLaMA and Qwen2, for more in-depth evaluation. 2. Although the method is effective, the improvements are limited.

Reviewer 02Rating 5Confidence 3

Strengths

* Model merging, specifically for LoRA models, is an interesting and cutting-edge field, with related techniques being widely proposed and explored in recent years. * KnOTS shows excellent performance across both vision and language tasks, enhancing merged model effectiveness and enabling better generalization on the newly introduced benchmark for multi-task data.

Weaknesses

* As far as I know, merging LoRA models using SVD is not a new technique; implementations have long been available in some open-source libraries and are widely used. Therefore, the innovation in this paper is questionable and appears limited. I'd like to know what are the differences and advantages of the techniques proposed in this paper compared to the SVD-based merging techniques in these libraries? * The experimental work in this paper is insufficient and does not meet a certain standard; m

Reviewer 03Rating 6Confidence 3

Strengths

1. The authors show that CKA representations seem to align with model merging abilities, without some limitations given by orthogonality approaches. 2. The authors propose to merge the LoRA weights after SVD, and showcase better performance than using existing full-rank approaches. 3. The authors propose to evaluate merged models on a multi-task benchmark that they obtained by combining the individual datasets in Ilharco et al. (2023).

Weaknesses

1. While the performance of KnOTS-TIERS is usually significantly better than TIERS, it is not the case for DARE-TIERS. This is not discussed in the paper, and it would be good to understand this behavior. 2. While the CKA alignment given by using KnOTS is significantly better than by the original LoRA weights, the performance improvements (e.g. on DARE-TIERS) are less pronounced, leaving the reader wondering whether CKA is indeed a good-enough metric for weight alignment. 3. The authors should d

Code & Models

Repositories

gstoica27/knots
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Processing and 3D Reconstruction