Sparser, Better, Deeper, Stronger: Improving Sparse Training with Exact Orthogonal Initialization
Aleksandra Irena Nowak, {\L}ukasz Gniecki, Filip Szatkowski, Jacek, Tabor

TL;DR
This paper introduces Exact Orthogonal Initialization (EOI), a novel method for sparse neural network initialization that improves training stability and performance by ensuring precise orthogonality, enabling deeper and more efficient sparse models.
Contribution
The paper proposes EOI, a new sparse orthogonal initialization technique based on Givens rotations, providing exact orthogonality and supporting arbitrary layer densities, outperforming existing methods.
Findings
EOI achieves superior performance over traditional sparse initializations.
Enables training of very deep sparse networks without residuals or normalization.
Demonstrates effectiveness on 1000-layer MLP and CNN models.
Abstract
Static sparse training aims to train sparse models from scratch, achieving remarkable results in recent years. A key design choice is given by the sparse initialization, which determines the trainable sub-network through a binary mask. Existing methods mainly select such mask based on a predefined dense initialization. Such an approach may not efficiently leverage the mask's potential impact on the optimization. An alternative direction, inspired by research into dynamical isometry, is to introduce orthogonality in the sparse subnetwork, which helps in stabilizing the gradient signal. In this work, we propose Exact Orthogonal Initialization (EOI), a novel sparse orthogonal initialization scheme based on composing random Givens rotations. Contrary to other existing approaches, our method provides exact (not approximated) orthogonality and enables the creation of layers with arbitrary…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTeaching and Learning Programming
