LDLT L-Lipschitz Network Weight Parameterization Initialization
Marius F. R. Juston, Ramavarapu S. Sreenivas, Dustin Nottage, Ahmet Soylemezoglu

TL;DR
This paper provides a detailed theoretical analysis of initialization dynamics for LDLT-based Lipschitz neural networks, deriving exact output variance formulas and offering practical initialization guidelines to mitigate information loss.
Contribution
It introduces a precise variance analysis for LDLT-Lipschitz layers, deriving closed-form expressions and providing practical initialization strategies based on theoretical insights.
Findings
Exact marginal output variance derived for Gaussian-initialized weights.
New parameterization with larger scaling improves output variance.
Empirical validation shows theoretical variance preservation but He initialization performs better in practice.
Abstract
We analyze initialization dynamics for LDLT-based -Lipschitz layers by deriving the exact marginal output variance when the underlying parameter matrix is initialized with IID Gaussian entries . The Wishart distribution, , used for computing the output marginal variance is derived in closed form using expectations of zonal polynomials via James' theorem and a Laplace-integral expansion of . We develop an Isserlis/Wick-based combinatorial expansion for and provide explicit truncated moments up to , which yield accurate series approximations for small-to-moderate . Monte Carlo experiments confirm the theoretical estimates. Furthermore, empirical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Statistical Mechanics and Entropy · Neural Networks and Applications
