Critical Learning Periods Emerge Even in Deep Linear Networks
Michael Kleinman, Alessandro Achille, Stefano Soatto

TL;DR
This paper demonstrates that deep linear networks exhibit critical learning periods similar to biological systems, and provides an analytical understanding of their emergence based on network depth, data structure, and task relationships.
Contribution
It introduces the first analytically tractable model showing how critical periods arise in deep linear networks, linking them to network depth, data, and task interactions.
Findings
Critical periods depend on network depth and data structure.
Feature learning involves competition between sources.
Pre-training can impair transfer learning depending on task relationships.
Abstract
Critical learning periods are periods early in development where temporary sensory deficits can have a permanent effect on behavior and learned representations. Despite the radical differences between biological and artificial networks, critical learning periods have been empirically observed in both systems. This suggests that critical periods may be fundamental to learning and not an accident of biology. Yet, why exactly critical periods emerge in deep networks is still an open question, and in particular it is unclear whether the critical periods observed in both systems depend on particular architectural or optimization details. To isolate the key underlying factors, we focus on deep linear network models, and show that, surprisingly, such networks also display much of the behavior seen in biology and artificial networks, while being amenable to analytical treatment. We show that…
Peer Reviews
Decision·ICLR 2024 spotlight
1. This paper continues previous work (on the deep network), the experiments correspond to research on the depth of the network, data distribution, competition between sources, and pre-training. 2. This work on studying the competition of different data sources is solid, and it seems to be a relatively good job through the linear multi-pathway network. 3. Analytical and minimal models provide fundamental insight. Intuition and empirical observations match well.
1. The paper is less readable and requires a higher theoretical foundation. The description of the “linear multi-pathway framework” (Sec.3.1.) is not clear enough and lacks corresponding details. 2. Only studying deep linear networks seems not enough to clearly understand why deep networks have critical periods, and network depth and data sources cannot guarantee sufficient persuasiveness. 3. There are fewer categories of experiments, and it would be better if the authors could provide more type
Well written, interesting avenue of research. Results well explained early in the text, with clear examples and clear explanations of experimental settings and technical details. Particularly nice clarity given the cross-disciplinary nature of the work. Related work is well written, reasonably thorough (to my knowledge), and well-explained. Nice phase portraits. Captions are mostly well-explained and self contained! (I don't know why this is so rare, but good job)
It's a bit of a tradeoff with having a clear hypothesis, but I find the case oversimplified/misframed in the beginning and conclusion. Sure the literal biochemical explanation doesn't hold in an artificial setting, but analogies of this (reduced plasticity) could well occur. I don't think these are 'competing' hypotheses as they are framed; plasticity, imperfect optimization, etc. are *mechanisms* by which critical periods could arise in both artificial and biological systems; it doesn't tell u
1. This paper is well-written, technically solid and has a concrete flow. 2. This work offers insights into the reasons behind the emergence of critical learning periods in both biological and artificial networks.
1. I think the assumption of the deep linear network is a bit strong. Since in real-world applications, most neural networks require non-linear activations. A deep linear network can be actually approximated by a single-layer linear network. 2. Critical periods in artificial deep neural networks (DNNs) may be due to specificities of the optimization process, such as an annealing learning rate, or from defects in the artificial implementation and training, like ReLU units becoming frozen or gradi
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural dynamics and brain function · Neural Networks and Applications · Advanced Memory and Neural Computing
MethodsFocus
