From Lazy to Rich: Exact Learning Dynamics in Deep Linear Networks

Cl\'ementine C. J. Domin\'e; and Nicolas Anguita; and Alexandra M.; Proca; and Lukas Braun; and Daniel Kunin; and Pedro A. M. Mediano; and Andrew; M. Saxe

arXiv:2409.14623·cs.LG·March 5, 2025

From Lazy to Rich: Exact Learning Dynamics in Deep Linear Networks

Cl\'ementine C. J. Domin\'e, and Nicolas Anguita, and Alexandra M., Proca, and Lukas Braun, and Daniel Kunin, and Pedro A. M. Mediano, and Andrew, M. Saxe

PDF

Open Access 1 Video 3 Reviews

TL;DR

This paper provides an exact analysis of how weight initialization affects learning dynamics in deep linear networks, bridging the gap between lazy and rich regimes and offering insights relevant to neuroscience and machine learning.

Contribution

It derives exact solutions for initialization-dependent learning dynamics, illuminating the transition between lazy and rich regimes in deep linear networks.

Findings

01

Exact solutions for weight initialization effects

02

Representation evolution across regimes

03

Implications for transfer and continual learning

Abstract

Biological and artificial neural networks develop internal representations that enable them to perform complex tasks. In artificial networks, the effectiveness of these models relies on their ability to build task specific representation, a process influenced by interactions among datasets, architectures, initialization strategies, and optimization algorithms. Prior studies highlight that different initializations can place networks in either a lazy regime, where representations remain static, or a rich/feature learning regime, where representations evolve dynamically. Here, we examine how initialization influences learning dynamics in deep linear neural networks, deriving exact solutions for lambda-balanced initializations-defined by the relative scale of weights across layers. These solutions capture the evolution of representations and the Neural Tangent Kernel across the spectrum…

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 6Confidence 3

Strengths

Originality: The paper built on top of existing work by extending from zero-balanced condition into $\lambda$-balanced scenarios, leading to a continuum from lazy to rich regime in terms of weight structure of two consecutive layers. Quality: Solid theoretical derivations backed with rich simulation results. Clarity: The paper is very well-written. Significance: The $\lambda$-balanced discussed in the paper covers from architecture shapes to initialization schemes and relate these structur

Weaknesses

The only weakness of the paper is on the weak demonstration of applications.

Reviewer 02Rating 6Confidence 4

Strengths

1. The paper is very well structured and the theoretical results are clearly written. 2. It is a novel and interesting idea to model the range of dynamics from lazy to rich with the balance parameter \lambda. 3. The authors analyzed their theoretical results on interesting semantic task examples, and showed multiple implications of their theory on other learning paradigms.

Weaknesses

1. The technical and theoretical novelty of this work is limited, the method used in this paper has mostly been established in Braun et. al. 2022, this work extends to the \lambda-balanced case, which is more general. However, the extension is achieved by enforcing a stricter assumption that the input is whitened, limiting the applicability of the results. 2. This paper makes several very strong assumptions such as task-aligned initialization, whitened inputs, linear networks, etc. It would be

Reviewer 03Rating 8Confidence 4

Strengths

• This work is novel and timely. • Strong theoretical foundation: the approach extends a previous theoretical framework (Braun et al., 2022) to derive exact solutions for NTK, representation similarity, gradient flow, and loss. This allows for precise determination of how initialization (specifically, the $\lambda$ parameter) influences representation and learning dynamics under the given assumptions. However, I did not have the bandwidth to verify the proofs in the appendix, which affects my c

Weaknesses

• This work relies on a list of assumptions and focuses on a simple two-layer linear feedforward network, which deviates from real-world settings. However, this didn’t significantly impact my score, as the assumptions are already more relaxed compared to previous works, and the authors have adequately addressed these limitations in the discussion. • I wish there were more intuitive explanations of how $\lambda$ interpolates between learning regimes. The authors provide some results in this di

Videos

From Lazy to Rich: Exact Learning Dynamics in Deep Linear Networks· slideslive

Taxonomy

TopicsNeural Networks and Applications