Stateless Neural Meta-Learning using Second-Order Gradients
Mike Huisman, Aske Plaat, Jan N. van Rijn

TL;DR
This paper introduces TURTLE, a new meta-learning algorithm that uses second-order gradients to outperform existing methods like MAML and LSTM-based meta-learners in few-shot learning tasks, with comparable computational cost.
Contribution
The paper formally shows that the meta-learner LSTM subsumes MAML and proposes TURTLE, a simpler yet more expressive algorithm leveraging second-order gradients for improved performance.
Findings
TURTLE outperforms MAML and LSTM meta-learners in few-shot tasks.
Second-order gradients significantly boost meta-learner performance.
TURTLE achieves superior accuracy without additional hyperparameter tuning.
Abstract
Deep learning typically requires large data sets and much compute power for each new problem that is learned. Meta-learning can be used to learn a good prior that facilitates quick learning, thereby relaxing these requirements so that new tasks can be learned quicker; two popular approaches are MAML and the meta-learner LSTM. In this work, we compare the two and formally show that the meta-learner LSTM subsumes MAML. Combining this insight with recent empirical findings, we construct a new algorithm (dubbed TURTLE) which is simpler than the meta-learner LSTM yet more expressive than MAML. TURTLE outperforms both techniques at few-shot sine wave regression and image classification on miniImageNet and CUB without any additional hyperparameter tuning, at a computational cost that is comparable with second-order MAML. The key to TURTLE's success lies in the use of second-order gradients,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Neural Networks and Applications
MethodsTanh Activation · Sigmoid Activation · Model-Agnostic Meta-Learning · Long Short-Term Memory
