Deep equilibrium networks are sensitive to initialization statistics
Atish Agarwala, Samuel S. Schoenholz

TL;DR
Deep equilibrium networks' training stability is highly influenced by the initial weight matrix statistics, with orthogonal or symmetric initializations improving robustness and enabling broader weight scale choices.
Contribution
This paper reveals the impact of initialization matrix statistics on DEQ training stability and proposes practical initialization strategies for better performance.
Findings
Orthogonal and symmetric initializations improve DEQ training stability.
Initialization statistics significantly affect the convergence of DEQs.
Broader initial weight scales are feasible with proper initialization.
Abstract
Deep equilibrium networks (DEQs) are a promising way to construct models which trade off memory for compute. However, theoretical understanding of these models is still lacking compared to traditional networks, in part because of the repeated application of a single set of weights. We show that DEQs are sensitive to the higher order statistics of the matrix families from which they are initialized. In particular, initializing with orthogonal or symmetric matrices allows for greater stability in training. This gives us a practical prescription for initializations which allow for training with a broader range of initial weight scales.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsQuantum many-body systems · Functional Brain Connectivity Studies · Neural dynamics and brain function
