IDAP++: Advancing Divergence-Based Pruning via Filter-Level and Layer-Level Optimization
Aleksei Samarin, Artem Nazarenko, Egor Kotenko, Valentin Malykh, Alexander Savelev, Aleksei Toropov

TL;DR
This paper introduces IDAP++, a divergence-based neural network pruning method that optimizes filters and layers by analyzing information flow divergence, leading to significant compression while maintaining accuracy across diverse architectures.
Contribution
The paper proposes a unified framework using tensor flow divergence for filter and layer-level pruning, adaptable to various neural network architectures, improving compression efficiency.
Findings
Achieves substantial parameter reduction across multiple architectures.
Maintains competitive accuracy after pruning.
Outperforms state-of-the-art methods in model compression.
Abstract
This paper presents a novel approach to neural network compression that addresses redundancy at both the filter and architectural levels through a unified framework grounded in information flow analysis. Building on the concept of tensor flow divergence, which quantifies how information is transformed across network layers, we develop a two-stage optimization process. The first stage employs iterative divergence-aware pruning to identify and remove redundant filters while preserving critical information pathways. The second stage extends this principle to higher-level architecture optimization by analyzing layer-wise contributions to information propagation and selectively eliminating entire layers that demonstrate minimal impact on network performance. The proposed method naturally adapts to diverse architectures, including convolutional networks, transformers, and hybrid designs,…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
1. Theoretical Novelty and Mathematical Rigor: The paper’s greatest contribution is its rigorous theoretical foundation. By modeling neural networks as continuous dynamical systems and introducing the Information Flow Divergence metric, the authors provide a principled and architecture-agnostic means to quantify information propagation. This formulation bridges information theory and dynamical systems analysis, moving beyond empirical heuristics commonly used in pruning literature. The inclusion
1. Computational Overhead and Practicality: While the paper claims that flow computation has O(L) complexity, the full pipeline involves iterative divergence evaluation, fine-tuning at multiple stages, and layer-level recomputation, which could lead to significant computational overhead for very large models (e.g., GPT-scale). The paper would benefit from a quantitative comparison of latency and resource costs against the baseline, such as LTH and RigL. 2. Hyperparameter Sensitivity and Usabilit
- The paper is well-written and clearly structured. - The paper proposes a novel, mathematically grounded framework that uniquely combines both filter-level and layer-level pruning, setting it apart from much of the heuristic-driven prior work. - A large number of experiments on different models and datasets in the paper have proved the effectiveness of IDAP++.
- There is no detailed profiling of actual computational overhead (e.g., for models with hundreds of layers or very high input/output dimensions), especially during repeated recomputation after every pruning phase. - This paper lacks ablation experiments and more comprehensive comparisons with the baseline. - The paper proposes a unified framework but employs two different types of metrics: a norm-based metric for filter pruning and a difference-based metric for layer pruning. The paper does not
- The paper is clearly written and well-structured, making it easy to follow. The integration of theoretical derivations with algorithmic descriptions and pseudocode effectively bridges intuition and implementation. - The introduction of the information flow divergence metric provides a rigorous mathematical framework for quantifying information propagation within deep neural networks. The concept of divergence is explicitly formulated for the most fundamental types of layers of neural network
- Some key mathematical statements—such as Lemma 1, Lemma 2, and Theorem 1—are presented without formal proofs or sufficient derivations. This omission weakens the theoretical rigor of the proposed information flow divergence formulation and leaves open questions about the validity and generality of the results. Providing at least proof sketches or detailed references would enhance credibility. - The process of testing multiple configurations to find the optimal pruning ratio $\rho^*$ likely re
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Software-Defined Networks and 5G · Neural Networks and Reservoir Computing
