Improving Neural Network Training in Low Dimensional Random Bases
Frithjof Gressmann, Zach Eaton-Rosen, Carlo Luschi

TL;DR
This paper enhances the efficiency of training deep neural networks in low-dimensional random subspaces by dynamically updating projections and applying independent projections to network parts, leading to faster and more scalable optimization.
Contribution
It introduces a method of re-drawing random subspaces at each training step and applying independent projections to network parts, improving optimization performance and scalability.
Findings
Re-drawing random subspaces each step improves training performance.
Applying independent projections to network parts enhances efficiency.
On-demand pseudo-random projections reduce memory and increase speed.
Abstract
Stochastic Gradient Descent (SGD) has proven to be remarkably effective in optimizing deep neural networks that employ ever-larger numbers of parameters. Yet, improving the efficiency of large-scale optimization remains a vital and highly active area of research. Recent work has shown that deep neural networks can be optimized in randomly-projected subspaces of much smaller dimensionality than their native parameter space. While such training is promising for more efficient and scalable optimization schemes, its practical application is limited by inferior optimization performance. Here, we improve on recent random subspace approaches as follows: Firstly, we show that keeping the random projection fixed throughout training is detrimental to optimization. We propose re-drawing the random subspace at each step, which yields significantly better performance. We realize further improvements…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Neural Network Applications · Stochastic Gradient Optimization Techniques · Machine Learning and ELM
