Visualizing and Understanding Curriculum Learning for Long Short-Term Memory Networks
Volkan Cirik, Eduard Hovy, Louis-Philippe Morency

TL;DR
This paper investigates how curriculum learning influences LSTM networks, demonstrating that it improves internal representations, especially in smaller models and with limited data, enhancing NLP task performance.
Contribution
It provides the first detailed analysis of curriculum learning's impact on LSTM internal states and shows its benefits in small data scenarios.
Findings
Curriculum learning biases LSTM internal states towards constructive representations.
Smaller models benefit significantly from curriculum learning.
Limited training data enhances the effectiveness of curriculum learning.
Abstract
Curriculum Learning emphasizes the order of training instances in a computational learning setup. The core hypothesis is that simpler instances should be learned early as building blocks to learn more complex ones. Despite its usefulness, it is still unknown how exactly the internal representation of models are affected by curriculum learning. In this paper, we study the effect of curriculum learning on Long Short-Term Memory (LSTM) networks, which have shown strong competency in many Natural Language Processing (NLP) problems. Our experiments on sentiment analysis task and a synthetic task similar to sequence prediction tasks in NLP show that curriculum learning has a positive effect on the LSTM's internal states by biasing the model towards building constructive representations i.e. the internal representation at the previous timesteps are used as building blocks for the final…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
