Few-Shot Learning of Compact Models via Task-Specific Meta Distillation
Yong Wu, Shekhor Chanda, Mehrdad Hosseinzadeh, Zhi Liu, Yang Wang

TL;DR
This paper introduces a novel meta-learning approach called task-specific meta distillation, which jointly trains large teacher and small student models to improve few-shot learning of compact models, outperforming existing methods.
Contribution
It proposes a new meta-learning framework that distills knowledge from a large teacher model to a small student model during meta-training for better few-shot adaptation.
Findings
Outperforms existing methods on benchmark datasets
Effectively adapts small models for few-shot learning
Demonstrates the benefit of joint teacher-student meta-training
Abstract
We consider a new problem of few-shot learning of compact models. Meta-learning is a popular approach for few-shot learning. Previous work in meta-learning typically assumes that the model architecture during meta-training is the same as the model architecture used for final deployment. In this paper, we challenge this basic assumption. For final deployment, we often need the model to be small. But small models usually do not have enough capacity to effectively adapt to new tasks. In the mean time, we often have access to the large dataset and extensive computing power during meta-training since meta-training is typically performed on a server. In this paper, we propose task-specific meta distillation that simultaneously learns two models in meta-learning: a large teacher model and a small student model. These two models are jointly learned during meta-training. Given a new task during…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Few-Shot Learning of Compact Models via Task-Specific Meta Distillation· youtube
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Cancer-related molecular mechanisms research · Multimodal Machine Learning Applications
