Deep Weight Factorization: Sparse Learning Through the Lens of Artificial Symmetries
Chris Kolb, Tobias Weber, Bernd Bischl, David R\"ugamer

TL;DR
This paper introduces deep weight factorization, a novel approach extending shallow weight decomposition to multiple factors, enabling smooth optimization of sparse neural networks and demonstrating superior performance over existing methods.
Contribution
It presents a new deep weight factorization method, with theoretical analysis, tailored initialization, and learning rate strategies, improving sparse neural network training.
Findings
Deep weight factorization outperforms shallow approaches and pruning methods.
Theoretical equivalence with non-convex sparse regularization is established.
Effective training requires specific initialization and learning rate settings.
Abstract
Sparse regularization techniques are well-established in machine learning, yet their application in neural networks remains challenging due to the non-differentiability of penalties like the norm, which is incompatible with stochastic gradient descent. A promising alternative is shallow weight factorization, where weights are decomposed into two factors, allowing for smooth optimization of -penalized neural networks by adding differentiable regularization to the factors. In this work, we introduce deep weight factorization, extending previous shallow approaches to more than two factors. We theoretically establish equivalence of our deep factorization with non-convex sparse regularization and analyze its impact on training dynamics and optimization. Due to the limitations posed by standard training practices, we propose a tailored initialization scheme and identify…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGait Recognition and Analysis · Advanced Computing and Algorithms · Hand Gesture Recognition Systems
MethodsPruning
