Towards Meta-Pruning via Optimal Transport
Alexander Theus, Olin Geimer, Friedrich Wicke, Thomas Hofmann, Sotiris, Anagnostidis, Sidak Pal Singh

TL;DR
This paper proposes Intra-Fusion, a novel neural network pruning method using optimal transport that achieves high accuracy without fine-tuning and reduces training time, offering a new direction in model compression.
Contribution
Intra-Fusion redefines pruning by leveraging model fusion and optimal transport, eliminating the need for importance metric design and fine-tuning.
Findings
Achieves substantial accuracy recovery without fine-tuning.
Reduces training time while maintaining performance.
Effective across various networks and datasets.
Abstract
Structural pruning of neural networks conventionally relies on identifying and discarding less important neurons, a practice often resulting in significant accuracy loss that necessitates subsequent fine-tuning efforts. This paper introduces a novel approach named Intra-Fusion, challenging this prevailing pruning paradigm. Unlike existing methods that focus on designing meaningful neuron importance metrics, Intra-Fusion redefines the overlying pruning procedure. Through utilizing the concepts of model fusion and Optimal Transport, we leverage an agnostically given importance metric to arrive at a more effective sparse model representation. Notably, our approach achieves substantial accuracy recovery without the need for resource-intensive fine-tuning, making it an efficient and promising tool for neural network compression. Additionally, we explore how fusion can be added to the…
Peer Reviews
Decision·ICLR 2024 spotlight
1. Improved accuracy: The article demonstrates that Intra-Fusion can significantly enhance the accuracy of pruned models without relying on any additional data. By merging similar neurons, Intra-Fusion better preserves the output of the original non-pruned model, leading to superior performance. 2. Data-free pruning: Pruning neural networks usually results in immediate drops in accuracy, requiring extensive fine-tuning. However, the article argues that with Intra-Fusion, a significant amount of
1. This article has severe writing issues: > + In the second paragraph of section 3.1, Figure 6 appears multiple times. I believe it should be Figure 1. > + In Algorithm 1, $ neuron\ j \in layer ∧ i\[j\] \ge t$ represents a logical "AND" relationship, not a neuron. > + The text contains many long and heavily clause-laden sentences, which pose a significant obstacle to understanding the article. I suggest avoiding such expressions as much as possible in academic papers, for example, in the senten
1. Most network pruning methods still rely on an excessive retraining process. This paper proposes a method to save the retraining, which potentially is of broad interest. 2. The proposed method uses OT to merge networks for model compression, unlike most of the conventional ways, which sounds novel to me. 3. The empirical results suggest the method is more effective than the default pruning scheme, especially without finetuning.
1. My biggest concern is about the empirical results. 1.1 Currently, it only compares with the default pruning for the main benchmark results (Tab. 1, Fig. 3 and 4). This looks quite limited to me. How is the method compared to other recent top-performing structured pruning methods like [*1 - *3]? It is highly advisable to add a set of comparisons with ResNet50 on ImageNet (as far as I know, this is the standard benchmark setup in a typical pruning paper). 1.2 Based on Tab. 1, after finetunin
1. The paper is well-organized and clearly written, which is easy to follow. 2. The problem studied in this paper is interesting and valuable. 3. The experimental verification is quite sufficient.
1. This paper presents extensive experiments across various settings. However, there are areas that could benefit from further exploration: It would be valuable to see comparative results with methods like the LOTTERY TICKET HYPOTHESIS [A]. How does the proposed approach stack up against such established techniques? 2. The FaP approach introduced in this paper seems to assume that the two models being fused have identical structures. It raises the question of how adaptable this method is. Can it
Code & Models
Videos
Taxonomy
TopicsModular Robots and Swarm Intelligence · DNA and Biological Computing
MethodsPruning · Focus
