TL;DR
This paper introduces a method where meta-learning determines which neural network weights to update, leading to patterned sparsity that improves generalization and reduces interference in few-shot and continual learning tasks.
Contribution
It demonstrates that learning where to learn via sparse gradient updates enhances meta-learning performance and reveals problem-specific sparsity patterns.
Findings
Patterned sparsity emerges from the learning process.
Sparse learning improves generalization in few-shot tasks.
Meta-learned learning rates also promote sparsity.
Abstract
Finding neural network weights that generalize well from small datasets is difficult. A promising approach is to learn a weight initialization such that a small number of weight changes results in low generalization error. We show that this form of meta-learning can be improved by letting the learning algorithm decide which weights to change, i.e., by learning where to learn. We find that patterned sparsity emerges from this process, with the pattern of sparsity varying on a problem-by-problem basis. This selective sparsity results in better generalization and less interference in a range of few-shot and continual learning problems. Moreover, we find that sparse learning also emerges in a more expressive model where learning rates are meta-learned. Our results shed light on an ongoing debate on whether meta-learning can discover adaptable features and suggest that learning by sparse…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Human Pose and Action Recognition · Model Reduction and Neural Networks
