Only Train Once: A One-Shot Neural Network Training And Pruning   Framework

Tianyi Chen; Bo Ji; Tianyu Ding; Biyi Fang; Guanyi Wang; Zhihui Zhu,; Luming Liang; Yixin Shi; Sheng Yi; Xiao Tu

arXiv:2107.07467·cs.LG·November 15, 2021·43 cites

Only Train Once: A One-Shot Neural Network Training And Pruning Framework

Tianyi Chen, Bo Ji, Tianyu Ding, Biyi Fang, Guanyi Wang, Zhihui Zhu,, Luming Liang, Yixin Shi, Sheng Yi, Xiao Tu

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a novel one-shot training and pruning framework for deep neural networks that enables simultaneous training and compression without fine-tuning, achieving state-of-the-art results on multiple benchmarks.

Contribution

The proposed OTO framework combines zero-invariant grouping and a new optimization algorithm to prune networks during training, eliminating the need for fine-tuning after pruning.

Findings

01

Achieves significant FLOPs reduction and parameter compression.

02

State-of-the-art results on CIFAR10, SQuAD, and competitive on ImageNet.

03

Outperforms standard methods in group sparsity exploration.

Abstract

Structured pruning is a commonly used technique in deploying deep neural networks (DNNs) onto resource-constrained devices. However, the existing pruning methods are usually heuristic, task-specified, and require an extra fine-tuning procedure. To overcome these limitations, we propose a framework that compresses DNNs into slimmer architectures with competitive performances and significant FLOPs reductions by Only-Train-Once (OTO). OTO contains two keys: (i) we partition the parameters of DNNs into zero-invariant groups, enabling us to prune zero groups without affecting the output; and (ii) to promote zero groups, we then formulate a structured-sparsity optimization problem and propose a novel optimization algorithm, Half-Space Stochastic Projected Gradient (HSPG), to solve it, which outperforms the standard proximal methods on group sparsity exploration and maintains comparable…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tianyic/only_train_once
pytorchOfficial

Videos

Only Train Once: A One-Shot Neural Network Training And Pruning Framework· slideslive

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Stochastic Gradient Optimization Techniques

MethodsAttention Is All You Need · Pruning · Linear Layer · Multi-Head Attention · Attention Dropout · Weight Decay · Refunds@Expedia|||How do I get a full refund from Expedia? · Linear Warmup With Linear Decay · Residual Connection · Dense Connections