Optimizing Learning Rate Schedules for Iterative Pruning of Deep Neural   Networks

Shiyu Liu; Rohan Ghosh; John Tan Chong Min; Mehul Motani

arXiv:2212.06144·cs.LG·January 3, 2023

Optimizing Learning Rate Schedules for Iterative Pruning of Deep Neural Networks

Shiyu Liu, Rohan Ghosh, John Tan Chong Min, Mehul Motani

PDF

Open Access

TL;DR

This paper introduces SILO, a theoretically motivated learning rate schedule for neural network pruning that dynamically adjusts the LR to improve generalization, achieving results comparable to exhaustive search methods.

Contribution

The paper provides a theoretical justification for LR schedules in pruning and proposes SILO, a new schedule that adaptively adjusts LR for better performance and generalization.

Findings

01

SILO improves accuracy by 2-4% on ImageNet and CIFAR datasets.

02

SILO matches Oracle performance with lower complexity.

03

Theoretical analysis supports the effectiveness of SILO's LR adjustment.

Abstract

The importance of learning rate (LR) schedules on network pruning has been observed in a few recent works. As an example, Frankle and Carbin (2019) highlighted that winning tickets (i.e., accuracy preserving subnetworks) can not be found without applying a LR warmup schedule and Renda, Frankle and Carbin (2020) demonstrated that rewinding the LR to its initial state at the end of each pruning cycle improves performance. In this paper, we go one step further by first providing a theoretical justification for the surprising effect of LR schedules. Next, we propose a LR schedule for network pruning called SILO, which stands for S-shaped Improved Learning rate Optimization. The advantages of SILO over existing state-of-the-art (SOTA) LR schedules are two-fold: (i) SILO has a strong theoretical motivation and dynamically adjusts the LR during pruning to improve generalization. Specifically,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Machine Learning and ELM

MethodsPruning