Meta Pruning via Graph Metanetworks : A Universal Meta Learning Framework for Network Pruning

Yewei Liu; Xiyuan Wang; Muhan Zhang

arXiv:2506.12041·cs.LG·December 16, 2025

Meta Pruning via Graph Metanetworks : A Universal Meta Learning Framework for Network Pruning

Yewei Liu, Xiyuan Wang, Muhan Zhang

PDF

Open Access 3 Reviews

TL;DR

This paper introduces a universal meta-learning framework using graph metanetworks for network pruning, capable of automatically learning complex pruning strategies applicable to various network types without additional training.

Contribution

It presents a novel graph metanetwork approach for network pruning, enabling automatic learning of pruning rules with high generality and transferability across different network architectures.

Findings

01

Achieves state-of-the-art pruning results on CNNs and Transformers.

02

Demonstrates high generality and transferability of the pruning framework.

03

Eliminates the need for special training during pruning.

Abstract

We propose an entirely new meta-learning framework for network pruning. It is a general framework that can be theoretically applied to almost all types of networks with all kinds of pruning and has great generality and transferability. Experiments have shown that it can achieve outstanding results on many popular and representative pruning tasks (including both CNNs and Transformers). Unlike all prior works that either rely on fixed, hand-crafted criteria to prune in a coarse manner, or employ learning to prune ways that require special training during each pruning and lack generality. Our framework can learn complex pruning rules automatically via a neural network (metanetwork) and has great generality that can prune without any special training. More specifically, we introduce the newly developed idea of metanetwork from meta-learning into pruning. A metanetwork is a network that…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 4Confidence 4

Strengths

- Introducing a metanetwork yields a reasonable accuracy improvement compared to the no-metanetwork baseline. - Multiple ablation studies support the effectiveness of the proposed approach. - Paper is well written and easy to follow

Weaknesses

- Proposed metanetwork exhibits limited transferability across diverse model architectures and datasets. Although the paper claims transferability as a key advantage (Section 4.4), the transfer experiments are restricted to highly similar settings: architecturally similar networks (ResNet56 vs. ResNet110) and datasets of similar scale and domain (CIFAR10/100/SVHN). To more convincingly demonstrate the claimed transferability, it could be better to include: (1) transfer across substantially diffe

Reviewer 02Rating 6Confidence 4

Strengths

The core advantage lies in its elimination of the need for specialized training for pruning each time. Training a meta network can prune any architecture network. Moreover, it also demonstrates strong generalization capabilities when transferring across datasets/architectures. Experimental details are fully described.

Weaknesses

The main text often uses the expressions "no prior work has done something like this before" and "universally applicable". However, meta networks to predict network transformations for pruning is highly similar with the previous paradigm of meta learning pruning. In Numerical Experiments, the performance advantage is not significant enough. On ResNet56/CIFAR-10, there is no substantial difference in accuracy compared to DepGraph and ATO, and there is a certain improvement in “Speed Up”; On VGG19

Reviewer 03Rating 2Confidence 4

Strengths

The idea of using a meta-network to transform a neural network to a pruning friendly one is interesting and new to me. The details are properly executed to design the proposed method. Experimental results on three classical CNN pruning tasks and a ViT show good accuracy and speed-up curves. The proposed method is shown to be robust to pruning criterion and has good transferability across similar datasets and model architectures without re-training the meta-network.

Weaknesses

The main experimental result table shows good numbers, but more baselines of recent structured, unstructured and hypernetwork pruning approaches would be desirable. The paper shows improved trade-off curves due to meta-network, but there’s limited analysis of which parameter and feature statistics, like channel saliency and inter-layer correlation, change. How much each model feature, like BN statistics, kernel encoding and residual edges, contributes to the performance. In general, I am not tot

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInnovative Teaching and Learning Methods · Online Learning and Analytics