Exploring Model Kinship for Merging Large Language Models

Yedi Hu; Yunzhi Yao; Ningyu Zhang; Huajun Chen; Shumin Deng

arXiv:2410.12613·cs.CL·September 24, 2025

Exploring Model Kinship for Merging Large Language Models

Yedi Hu, Yunzhi Yao, Ningyu Zhang, Huajun Chen, Shumin Deng

PDF

Open Access 1 Repo 10 Models 3 Reviews

TL;DR

This paper introduces the concept of model kinship to understand and improve the process of merging large language models, demonstrating that kinship-guided merging enhances performance and mitigates local optima issues.

Contribution

It proposes the novel concept of model kinship, analyzes its relation to merging gains, and develops a kinship-guided merging strategy to improve LLM evolution.

Findings

01

Model kinship correlates with performance gains in merging.

02

Kinship-guided merging improves benchmark results.

03

The approach mitigates local optima during model evolution.

Abstract

Model merging has emerged as a key technique for enhancing the capabilities and efficiency of Large Language Models (LLMs). The open-source community has driven model evolution by iteratively merging existing models, yet a principled understanding of the gains and underlying factors in model merging remains limited. In this work, we study model evolution through iterative merging, drawing an analogy to biological evolution, and introduce the concept of model kinship, the degree of similarity or relatedness between LLMs. Through comprehensive empirical analysis, we show that model kinship is closely linked to the performance improvements achieved by merging, providing a useful criterion for selecting candidate models. Building on this insight, we propose a new model merging strategy: Top-k Greedy Merging with Model Kinship, which can improve benchmark performance. Specifically, we…

Peer Reviews

Decision·Submitted to ICLR 2025

Reviewer 01Rating 3Confidence 3

Strengths

1. The research area is significant, and the introduction of the kinship metric, drawing an analogy to biological evolution, presents a human-like perspective that adds depth to the discussion of model merging. 2. The paper effectively establishes a connection between kinship and performance gains, providing empirical evidence that supports the proposed approach.

Weaknesses

1. The definition of the problem is unclear, and the paper lacks a thorough discussion of specific scientific issues within the model merging field. The research motivation is insufficient, leading to a lack of logical clarity in the arguments presented. 2. While introducing human-like concepts or analogies from other fields can be a valuable approach to problem-solving, the kinship metric designed in this paper lacks a reasonable formulation and effective validation, which undermines its appli

Reviewer 02Rating 8Confidence 3

Strengths

This paper targets at solving an important task. Also, this paper is very complete and well-written, from introducing the concept of model kinship, to empirically verify its value, and finally provide a practical scenario to apply model kinship to improve the model merging performance. I feel this is a complete work.

Weaknesses

1. The bold fonts in this paper emphasize the relationship between model merging and biological evolution. Actually, I don't quite get their correlation. Biological evolution is based on nature selection while in this paper, model merging is done based on model similarity. Can the authors further elaborate on this? 2. Since all the experiments in this paper are only verified on one model set and one task set, there is no guarantee that the proposed pipeline can be generalized to other real-worl

Reviewer 03Rating 6Confidence 2

Strengths

1. The paper introduces a novel exploration of the kinship between LLMs, using a heuristic approach inspired by biological evolution to guide the selection of models for merging. The overall idea is interesting. 2. The narrative of the paper is well-structured, and the authors validate their experiments using Mistral as the primary architecture.

Weaknesses

1. The experimental section lacks clarity. It is recommended that the authors add a dedicated subsection analyzing the datasets used and explaining why these specific datasets (e.g., general-purpose or domain-specific test data) were chosen for performance evaluation. 2. In the appendix (Figure 8), the authors present performance analysis based on various merging experiments using the Mistral architecture. However, it is unclear whether the results are applicable to other architectures like LL

Code & Models

Repositories

zjunlp/ModelKinship
pytorchOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling