Sliding-Window Merging for Compacting Patch-Redundant Layers in LLMs

Xuan Ding; Rui Sun; Yunjian Zhang; Xiu Yan; Yueqi Zhou; Kaihao Huang; Suzhong Fu; Angelica I Aviles-Rivero; Chuanlong Xie; Yao Zhu

arXiv:2502.19159·cs.CV·March 20, 2026

Sliding-Window Merging for Compacting Patch-Redundant Layers in LLMs

Xuan Ding, Rui Sun, Yunjian Zhang, Xiu Yan, Yueqi Zhou, Kaihao Huang, Suzhong Fu, Angelica I Aviles-Rivero, Chuanlong Xie, Yao Zhu

PDF

1 Repo

TL;DR

This paper introduces Sliding-Window Merging, a dynamic compression technique that merges similar consecutive layers in large language models to reduce redundancy and maintain performance, outperforming existing pruning methods.

Contribution

The paper presents a novel layer merging approach based on functional similarity, effectively simplifying LLMs while preserving their inference capabilities.

Findings

01

Outperforms existing pruning techniques in zero-shot inference.

02

Achieves 1.654% performance improvement with 35% pruning on Vicuna-7B.

03

Demonstrates potential of combining depth and width pruning.

Abstract

Depth-wise pruning accelerates LLM inference in resource-constrained scenarios but suffers from performance degradation due to direct removal of entire Transformer layers. This paper reveals ``Patch-like'' redundancy across layers via correlation analysis of the outputs of different layers in reproducing kernel Hilbert space, demonstrating consecutive layers exhibit high functional similarity. Building on this observation, this paper proposes Sliding-Window Merging (SWM) - a dynamic compression method that selects consecutive layers from top to bottom using a pre-defined similarity threshold, and compacts patch-redundant layers through a parameter consolidation, thereby simplifying the model structure while maintaining its performance. Extensive experiments on LLMs with various architectures and different parameter scales show that our method outperforms existing pruning techniques in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

920927/slm-a-sliding-layer-merging-method
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLinear Layer · Softmax · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Absolute Position Encodings · Residual Connection · Label Smoothing · Multi-Head Attention · Dense Connections · Adam