Curriculum Learning with Adam: The Devil Is in the Wrong Details
Lucas Weber, Jaap Jumelet, Paul Michel, Elia Bruni, Dieuwke Hupkes

TL;DR
This paper investigates why curriculum learning methods often underperform in NLP, revealing that their effectiveness is compromised when combined with Adam optimizer due to hyperparameter adaptation issues.
Contribution
The study uncovers the brittleness of curriculum learning in NLP and shows that Adam optimizer hyperparameters often overshadow curriculum effects, challenging previous assumptions.
Findings
Curriculum methods are brittle when combined with Adam in NLP.
Adam hyperparameters often dominate curriculum effects.
No curriculum approach outperforms well-tuned Adam alone.
Abstract
Curriculum learning (CL) posits that machine learning models -- similar to humans -- may learn more efficiently from data that match their current learning progress. However, CL methods are still poorly understood and, in particular for natural language processing (NLP), have achieved only limited success. In this paper, we explore why. Starting from an attempt to replicate and extend a number of recent curriculum methods, we find that their results are surprisingly brittle when applied to NLP. A deep dive into the (in)effectiveness of the curricula in some scenarios shows us why: when curricula are employed in combination with the popular Adam optimisation algorithm, they oftentimes learn to adapt to suboptimally chosen optimisation parameters for this algorithm. We present a number of different case studies with different common hand-crafted and automated CL approaches to illustrate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Machine Learning and Algorithms · Neural Networks and Applications
MethodsNone · Adam
