Training Structured Neural Networks Through Manifold Identification and Variance Reduction
Zih-Syuan Huang, Ching-pei Lee

TL;DR
This paper introduces RMDA, an efficient algorithm for training structured neural networks that guarantees desired structures through manifold identification and variance reduction, outperforming existing methods.
Contribution
The paper presents RMDA, a novel variance-reduction algorithm that promotes structured sparsity in neural networks without extra computation, and proves finite-iteration structure identification.
Findings
RMDA achieves structure identification after finite iterations.
RMDA outperforms existing methods in structured sparsity training.
RMDA surpasses state-of-the-art pruning methods for unstructured sparsity.
Abstract
This paper proposes an algorithm (RMDA) for training neural networks (NNs) with a regularization term for promoting desired structures. RMDA does not incur computation additional to proximal SGD with momentum, and achieves variance reduction without requiring the objective function to be of the finite-sum form. Through the tool of manifold identification from nonlinear optimization, we prove that after a finite number of iterations, all iterates of RMDA possess a desired structure identical to that induced by the regularizer at the stationary point of asymptotic convergence, even in the presence of engineering tricks like data augmentation and dropout that complicate the training process. Experiments on training NNs with structured sparsity confirm that variance reduction is necessary for such an identification, and show that RMDA thus significantly outperforms existing methods for this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNeural Networks and Applications · Machine Learning and ELM · Model Reduction and Neural Networks
MethodsPruning · Dropout · Stochastic Gradient Descent
