Curriculum Learning by Transfer Learning: Theory and Experiments with Deep Networks
Daphna Weinshall, Gad Cohen, Dan Amir

TL;DR
This paper investigates curriculum learning both theoretically for convex linear regression and empirically for CNNs, showing that increasing example difficulty accelerates convergence and improves generalization, especially when guided by transfer learning.
Contribution
It provides a theoretical analysis of curriculum learning's convergence properties and proposes a transfer learning-based method to infer curricula for deep networks.
Findings
Convergence rate increases with example difficulty in theory.
Transfer learning-based curriculum accelerates CNN training.
Curriculum learning improves generalization and robustness.
Abstract
We provide theoretical investigation of curriculum learning in the context of stochastic gradient descent when optimizing the convex linear regression loss. We prove that the rate of convergence of an ideal curriculum learning method is monotonically increasing with the difficulty of the examples. Moreover, among all equally difficult points, convergence is faster when using points which incur higher loss with respect to the current hypothesis. We then analyze curriculum learning in the context of training a CNN. We describe a method which infers the curriculum by way of transfer learning from another network, pre-trained on a different task. While this approach can only approximate the ideal curriculum, we observe empirically similar behavior to the one predicted by the theory, namely, a significant boost in convergence speed at the beginning of training. When the task is made more…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Stochastic Gradient Optimization Techniques · Adversarial Robustness in Machine Learning
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Linear Regression
