ResRep: Lossless CNN Pruning via Decoupling Remembering and Forgetting
Xiaohan Ding, Tianxiang Hao, Jianchao Tan, Ji Liu, Jungong Han, Yuchen, Guo, Guiguang Ding

TL;DR
ResRep introduces a lossless CNN pruning method inspired by neurobiology, which decouples remembering and forgetting to achieve high compression without accuracy loss.
Contribution
It proposes a novel re-parameterization approach that enables lossless channel pruning by separating and merging remembering and forgetting components.
Findings
Achieves 45% FLOPs reduction on ResNet-50 with no accuracy drop.
First to demonstrate lossless pruning at such a high compression ratio.
Utilizes a novel update rule with penalty gradients for structured sparsity.
Abstract
We propose ResRep, a novel method for lossless channel pruning (a.k.a. filter pruning), which slims down a CNN by reducing the width (number of output channels) of convolutional layers. Inspired by the neurobiology research about the independence of remembering and forgetting, we propose to re-parameterize a CNN into the remembering parts and forgetting parts, where the former learn to maintain the performance and the latter learn to prune. Via training with regular SGD on the former but a novel update rule with penalty gradients on the latter, we realize structured sparsity. Then we equivalently merge the remembering and forgetting parts into the original architecture with narrower layers. In this sense, ResRep can be viewed as a successful application of Structural Re-parameterization. Such a methodology distinguishes ResRep from the traditional learning-based pruning paradigm that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Image and Signal Denoising Methods · Domain Adaptation and Few-Shot Learning
MethodsPruning · Stochastic Gradient Descent
