On the dynamics of three-layer neural networks: initial condensation
Zheng-An Chen, Tao Luo

TL;DR
This paper investigates the condensation phenomenon in three-layer neural networks, revealing the mechanisms behind it and its relation to low-rank biases, supported by theoretical analysis and experiments.
Contribution
It provides a rigorous theoretical analysis of condensation in three-layer networks and distinguishes it from two-layer networks, linking it to low-rank biases.
Findings
Condensation occurs in three-layer networks under certain conditions.
Theoretical analysis shows blow-up behavior in effective dynamics.
Experimental results support the condensation mechanism.
Abstract
Empirical and theoretical works show that the input weights of two-layer neural networks, when initialized with small values, converge towards isolated orientations. This phenomenon, referred to as condensation, indicates that the gradient descent methods tend to spontaneously reduce the complexity of neural networks during the training process. In this work, we elucidate the mechanisms behind the condensation phenomena occurring in the training of three-layer neural networks and distinguish it from the training of two-layer neural networks. Through rigorous theoretical analysis, we establish the blow-up property of effective dynamics and present a sufficient condition for the occurrence of condensation, findings that are substantiated by experimental results. Additionally, we explore the association between condensation and the low-rank bias observed in deep matrix factorization.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
