Only Train Once: A One-Shot Neural Network Training And Pruning Framework
Tianyi Chen, Bo Ji, Tianyu Ding, Biyi Fang, Guanyi Wang, Zhihui Zhu,, Luming Liang, Yixin Shi, Sheng Yi, Xiao Tu

TL;DR
This paper introduces a novel one-shot training and pruning framework for deep neural networks that enables simultaneous training and compression without fine-tuning, achieving state-of-the-art results on multiple benchmarks.
Contribution
The proposed OTO framework combines zero-invariant grouping and a new optimization algorithm to prune networks during training, eliminating the need for fine-tuning after pruning.
Findings
Achieves significant FLOPs reduction and parameter compression.
State-of-the-art results on CIFAR10, SQuAD, and competitive on ImageNet.
Outperforms standard methods in group sparsity exploration.
Abstract
Structured pruning is a commonly used technique in deploying deep neural networks (DNNs) onto resource-constrained devices. However, the existing pruning methods are usually heuristic, task-specified, and require an extra fine-tuning procedure. To overcome these limitations, we propose a framework that compresses DNNs into slimmer architectures with competitive performances and significant FLOPs reductions by Only-Train-Once (OTO). OTO contains two keys: (i) we partition the parameters of DNNs into zero-invariant groups, enabling us to prune zero groups without affecting the output; and (ii) to promote zero groups, we then formulate a structured-sparsity optimization problem and propose a novel optimization algorithm, Half-Space Stochastic Projected Gradient (HSPG), to solve it, which outperforms the standard proximal methods on group sparsity exploration and maintains comparable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Stochastic Gradient Optimization Techniques
MethodsAttention Is All You Need · Pruning · Linear Layer · Multi-Head Attention · Attention Dropout · Weight Decay · Refunds@Expedia|||How do I get a full refund from Expedia? · Linear Warmup With Linear Decay · Residual Connection · Dense Connections
