# Rethinking Layer-wise Model Merging through Chain of Merges

**Authors:** Pietro Buzzega, Riccardo Salami, Angelo Porrello, Simone Calderara

arXiv: 2508.21421 · 2026-02-26

## TL;DR

This paper introduces Chain of Merges (CoM), a novel layer-wise model merging method that sequentially updates activation statistics to better integrate multiple models, outperforming existing techniques.

## Contribution

The paper proposes CoM, a new merging approach that explicitly accounts for inter-layer dependencies, reducing covariate shift and improving model merging quality.

## Key findings

- CoM outperforms existing merging methods on standard benchmarks.
- Explicitly updating activation statistics mitigates internal covariate shift.
- CoM achieves state-of-the-art performance in model merging tasks.

## Abstract

Fine-tuning pretrained models has become a standard pathway to achieve state-of-the-art performance across a wide range of domains, leading to a proliferation of task-specific model variants. As the number of such specialized models increases, merging them into a unified model without retraining has become a critical challenge. Existing merging techniques operate at the level of individual layers, thereby overlooking the inter-layer dependencies inherent in deep networks. We show that this simplification leads to distributional mismatches, particularly in methods that rely on intermediate activations, as changes in early layers are not properly propagated to downstream layers during merging. We identify these mismatches as a form of internal covariate shift, comparable to the phenomenon encountered in the initial phases of neural networks training. To address this, we propose Chain of Merges (CoM), a layer-wise merging procedure that sequentially merges weights across layers while sequentially updating activation statistics. By explicitly accounting for inter-layer interactions, CoM mitigates covariate shift and produces a coherent merged model through a series of conditionally optimal updates. Experiments on standard benchmarks demonstrate that CoM achieves state-of-the-art performance.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2508.21421/full.md

## Figures

28 figures with captions in the complete paper: https://tomesphere.com/paper/2508.21421/full.md

## References

64 references — full list in the complete paper: https://tomesphere.com/paper/2508.21421/full.md

---
Source: https://tomesphere.com/paper/2508.21421