On the dynamics of three-layer neural networks: initial condensation

Zheng-An Chen; Tao Luo

arXiv:2402.15958·cs.LG·February 28, 2024·1 cites

On the dynamics of three-layer neural networks: initial condensation

Zheng-An Chen, Tao Luo

PDF

Open Access

TL;DR

This paper investigates the condensation phenomenon in three-layer neural networks, revealing the mechanisms behind it and its relation to low-rank biases, supported by theoretical analysis and experiments.

Contribution

It provides a rigorous theoretical analysis of condensation in three-layer networks and distinguishes it from two-layer networks, linking it to low-rank biases.

Findings

01

Condensation occurs in three-layer networks under certain conditions.

02

Theoretical analysis shows blow-up behavior in effective dynamics.

03

Experimental results support the condensation mechanism.

Abstract

Empirical and theoretical works show that the input weights of two-layer neural networks, when initialized with small values, converge towards isolated orientations. This phenomenon, referred to as condensation, indicates that the gradient descent methods tend to spontaneously reduce the complexity of neural networks during the training process. In this work, we elucidate the mechanisms behind the condensation phenomena occurring in the training of three-layer neural networks and distinguish it from the training of two-layer neural networks. Through rigorous theoretical analysis, we establish the blow-up property of effective dynamics and present a sufficient condition for the occurrence of condensation, findings that are substantiated by experimental results. Additionally, we explore the association between condensation and the low-rank bias observed in deep matrix factorization.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications