LDLT L-Lipschitz Network Weight Parameterization Initialization

Marius F. R. Juston; Ramavarapu S. Sreenivas; Dustin Nottage; Ahmet Soylemezoglu

arXiv:2601.08253·cs.LG·January 14, 2026

LDLT L-Lipschitz Network Weight Parameterization Initialization

Marius F. R. Juston, Ramavarapu S. Sreenivas, Dustin Nottage, Ahmet Soylemezoglu

PDF

Open Access

TL;DR

This paper provides a detailed theoretical analysis of initialization dynamics for LDLT-based Lipschitz neural networks, deriving exact output variance formulas and offering practical initialization guidelines to mitigate information loss.

Contribution

It introduces a precise variance analysis for LDLT-Lipschitz layers, deriving closed-form expressions and providing practical initialization strategies based on theoretical insights.

Findings

01

Exact marginal output variance derived for Gaussian-initialized weights.

02

New parameterization with larger scaling improves output variance.

03

Empirical validation shows theoretical variance preservation but He initialization performs better in practice.

Abstract

We analyze initialization dynamics for LDLT-based $L$ -Lipschitz layers by deriving the exact marginal output variance when the underlying parameter matrix $W_{0} \in R^{m \times n}$ is initialized with IID Gaussian entries $N (0, σ^{2})$ . The Wishart distribution, $S = W_{0} W_{0}^{⊤} \sim W_{m} (n, σ^{2} I_{m})$ , used for computing the output marginal variance is derived in closed form using expectations of zonal polynomials via James' theorem and a Laplace-integral expansion of $(α I_{m} + S)^{- 1}$ . We develop an Isserlis/Wick-based combinatorial expansion for $E [tr (S^{k})]$ and provide explicit truncated moments up to $k = 10$ , which yield accurate series approximations for small-to-moderate $σ^{2}$ . Monte Carlo experiments confirm the theoretical estimates. Furthermore, empirical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Statistical Mechanics and Entropy · Neural Networks and Applications