Meta Pruning via Graph Metanetworks : A Universal Meta Learning Framework for Network Pruning
Yewei Liu, Xiyuan Wang, Muhan Zhang

TL;DR
This paper introduces a universal meta-learning framework using graph metanetworks for network pruning, capable of automatically learning complex pruning strategies applicable to various network types without additional training.
Contribution
It presents a novel graph metanetwork approach for network pruning, enabling automatic learning of pruning rules with high generality and transferability across different network architectures.
Findings
Achieves state-of-the-art pruning results on CNNs and Transformers.
Demonstrates high generality and transferability of the pruning framework.
Eliminates the need for special training during pruning.
Abstract
We propose an entirely new meta-learning framework for network pruning. It is a general framework that can be theoretically applied to almost all types of networks with all kinds of pruning and has great generality and transferability. Experiments have shown that it can achieve outstanding results on many popular and representative pruning tasks (including both CNNs and Transformers). Unlike all prior works that either rely on fixed, hand-crafted criteria to prune in a coarse manner, or employ learning to prune ways that require special training during each pruning and lack generality. Our framework can learn complex pruning rules automatically via a neural network (metanetwork) and has great generality that can prune without any special training. More specifically, we introduce the newly developed idea of metanetwork from meta-learning into pruning. A metanetwork is a network that…
Peer Reviews
Decision·Submitted to ICLR 2026
- Introducing a metanetwork yields a reasonable accuracy improvement compared to the no-metanetwork baseline. - Multiple ablation studies support the effectiveness of the proposed approach. - Paper is well written and easy to follow
- Proposed metanetwork exhibits limited transferability across diverse model architectures and datasets. Although the paper claims transferability as a key advantage (Section 4.4), the transfer experiments are restricted to highly similar settings: architecturally similar networks (ResNet56 vs. ResNet110) and datasets of similar scale and domain (CIFAR10/100/SVHN). To more convincingly demonstrate the claimed transferability, it could be better to include: (1) transfer across substantially diffe
The core advantage lies in its elimination of the need for specialized training for pruning each time. Training a meta network can prune any architecture network. Moreover, it also demonstrates strong generalization capabilities when transferring across datasets/architectures. Experimental details are fully described.
The main text often uses the expressions "no prior work has done something like this before" and "universally applicable". However, meta networks to predict network transformations for pruning is highly similar with the previous paradigm of meta learning pruning. In Numerical Experiments, the performance advantage is not significant enough. On ResNet56/CIFAR-10, there is no substantial difference in accuracy compared to DepGraph and ATO, and there is a certain improvement in “Speed Up”; On VGG19
The idea of using a meta-network to transform a neural network to a pruning friendly one is interesting and new to me. The details are properly executed to design the proposed method. Experimental results on three classical CNN pruning tasks and a ViT show good accuracy and speed-up curves. The proposed method is shown to be robust to pruning criterion and has good transferability across similar datasets and model architectures without re-training the meta-network.
The main experimental result table shows good numbers, but more baselines of recent structured, unstructured and hypernetwork pruning approaches would be desirable. The paper shows improved trade-off curves due to meta-network, but there’s limited analysis of which parameter and feature statistics, like channel saliency and inter-layer correlation, change. How much each model feature, like BN statistics, kernel encoding and residual edges, contributes to the performance. In general, I am not tot
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInnovative Teaching and Learning Methods · Online Learning and Analytics
