Forget the Data and Fine-Tuning! Just Fold the Network to Compress
Dong Wang, Haris \v{S}iki\'c, Lothar Thiele, Olga Saukh

TL;DR
This paper presents model folding, a data-free compression method that merges similar neurons across layers to significantly reduce model size without fine-tuning, maintaining performance on large models.
Contribution
The paper introduces a novel data-free model compression technique called model folding that merges similar neurons across layers, preserving data statistics and avoiding fine-tuning.
Findings
Achieves comparable performance to data-driven methods.
Outperforms recent data-free compression techniques.
Effective for large-scale models at high sparsity.
Abstract
We introduce model folding, a novel data-free model compression technique that merges structurally similar neurons across layers, significantly reducing the model size without the need for fine-tuning or access to training data. Unlike existing methods, model folding preserves data statistics during compression by leveraging k-means clustering, and using novel data-free techniques to prevent variance collapse or explosion. Our theoretical framework and experiments across standard benchmarks, including ResNet18 and LLaMA-7B, demonstrate that model folding achieves comparable performance to data-driven compression techniques and outperforms recently proposed data-free methods, especially at high sparsity levels. This approach is particularly effective for compressing large-scale models, making it suitable for deployment in resource-constrained environments.
Peer Reviews
Decision·ICLR 2025 Poster
I think conceptually connecting the research and recent methods in neuron alignment / neural network symmetry (Yamada et al., 2023; Ainsworth et al., 2023) to the problem of model compression is somewhat novel and deserves more attention, although it certainly has been done many times, see e.g., [Zhou et al., 2018](https://arxiv.org/abs/1804.05862) and the paper [Chen et al., 2023](https://arxiv.org/pdf/2310.06756) cited in the current work. Methodologically the contribution seems fairly increme
1. The paper is missing related literature and baselines on model quantization. A simple Google search suggests quite a few related papers, e.g., using vector quantization for model compression [[Martinez et al., 2021]](https://openaccess.thecvf.com/content/CVPR2021/papers/Martinez_Permute_Quantize_and_Fine-Tune_Efficient_Compression_of_Neural_Networks_CVPR_2021_paper.pdf) and post-training quantization [[Nagel et al., 2021](https://arxiv.org/pdf/2106.08295)] which is also training-free. 2. Som
1. The method has both theoretical justification and empirical support, demonstrating that k-means clustering is an optimal method for weight fusion in a data-free manner. Results from benchmarks like ResNet18 and LLaMA-7B show that model folding achieves performance on par with or surpasses existing data-driven and data-free compression methods, particularly at high sparsity levels. 2. Model folding is designed to be completely data-free, which differentiates it from other compression methods
1. More data-free and training-free references are needed, such as [1] 2. Lack of sufficient experimental results on compression ratio and speedup ratio. [1] Haroush, Matan, et al. "The knowledge within: Methods for data-free model compression." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020.
1. Previous methods typically rely on data-driven fine-tuning to restore the performance of compressed models. This paper addresses this limitation by introducing a data-free method that leverages generated data and an efficient activation repair process to recover model accuracy, making it highly practical for real-world applications. 2. The paper includes extensive experiments on both vision and language models, with clear visualizations and figures that offer detailed insights into the method
1. My main concern with this submission is the limited technical novelty. The work largely combines existing methods, including similarity-based model merging for compression [1], model inversion for data generation [2], and REPAIR [3] for statistics alignment. While these components are integrated effectively, the key contributions could be further emphasized to distinguish this work. 2. Although the paper claims the method is data-free, this feature is primarily enabled by DeepInversion rather
Code & Models
Videos
Taxonomy
TopicsScientific Computing and Data Management · Big Data and Digital Economy
