Early learning of the optimal constant solution in neural networks and   humans

Jirko Rubruck; Jan P. Bauer; Andrew Saxe; Christopher Summerfield

arXiv:2406.17467·cs.LG·June 26, 2024·1 cites

Early learning of the optimal constant solution in neural networks and humans

Jirko Rubruck, Jan P. Bauer, Andrew Saxe, Christopher Summerfield

PDF

Open Access

TL;DR

This paper reveals that both neural networks and humans initially learn an optimal constant solution that reflects label distributions before utilizing input information, highlighting a universal early learning phase.

Contribution

The study introduces a theoretical and empirical analysis of the early OCS phase in neural networks and humans, demonstrating its universality and mechanistic basis.

Findings

01

Neural networks exhibit an early phase learning the OCS, mirroring label distributions.

02

Humans show signatures of the OCS in early learning dynamics over three days.

03

The OCS emerges even without bias terms, driven by input data correlations.

Abstract

Deep neural networks learn increasingly complex functions over the course of training. Here, we show both empirically and theoretically that learning of the target function is preceded by an early phase in which networks learn the optimal constant solution (OCS) - that is, initial model responses mirror the distribution of target labels, while entirely ignoring information provided in the input. Using a hierarchical category learning task, we derive exact solutions for learning dynamics in deep linear networks trained with bias terms. Even when initialized to zero, this simple architectural feature induces substantial changes in early dynamics. We identify hallmarks of this early OCS phase and illustrate how these signatures are observed in deep linear networks and larger, more complex (and nonlinear) convolutional neural networks solving a hierarchical learning task based on MNIST and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications