An Analytical Theory of Curriculum Learning in Teacher-Student Networks
Luca Saglietti, Stefano Sarao Mannelli, and Andrew Saxe

TL;DR
This paper provides a theoretical analysis of curriculum learning in neural networks, showing how it can modestly speed up training and significantly improve test performance when properly integrated, using statistical physics methods.
Contribution
It offers an exact analytical framework for understanding when and why curriculum learning benefits neural network training and generalization.
Findings
Curriculum learning modestly speeds up online training.
Without modifications, curriculum does not improve generalization.
Connecting learning phases with priors enhances test performance.
Abstract
In humans and animals, curriculum learning -- presenting data in a curated order - is critical to rapid learning and effective pedagogy. Yet in machine learning, curricula are not widely used and empirically often yield only moderate benefits. This stark difference in the importance of curriculum raises a fundamental theoretical question: when and why does curriculum learning help? In this work, we analyse a prototypical neural network model of curriculum learning in the high-dimensional limit, employing statistical physics methods. Curricula could in principle change both the learning speed and asymptotic performance of a model. To study the former, we provide an exact description of the online learning setting, confirming the long-standing experimental observation that curricula can modestly speed up learning. To study the latter, we derive performance in a batch learning setting,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNeural Networks and Applications · Model Reduction and Neural Networks · Machine Learning and ELM
