How to distribute data across tasks for meta-learning?

Alexandru Cioba; Michael Bromberg; Qian Wang; Ritwik Niyogi; Georgios; Batzolis; Jezabel Garcia; Da-shan Shiu; Alberto Bernacchia

arXiv:2103.08463·cs.LG·April 11, 2022

How to distribute data across tasks for meta-learning?

Alexandru Cioba, Michael Bromberg, Qian Wang, Ritwik Niyogi, Georgios, Batzolis, Jezabel Garcia, Da-shan Shiu, Alberto Bernacchia

PDF

Open Access 1 Video

TL;DR

This paper investigates optimal data distribution strategies across tasks in meta-learning, revealing how task difficulty and homogeneity influence label allocation to improve learning efficiency and transferability.

Contribution

It provides a theoretical and empirical analysis of label allocation in meta-learning, offering guidelines for data distribution based on task similarity and difficulty.

Findings

01

Uniform allocation is optimal for homogeneous tasks.

02

Trade-off exists between number of tasks and data per task at fixed budget.

03

Harder tasks benefit from more data when trained separately.

Abstract

Meta-learning models transfer the knowledge acquired from previous tasks to quickly learn new ones. They are trained on benchmarks with a fixed number of data points per task. This number is usually arbitrary and it is unknown how it affects performance at testing. Since labelling of data is expensive, finding the optimal allocation of labels across training tasks may reduce costs. Given a fixed budget of labels, should we use a small number of highly labelled tasks, or many tasks with few labels each? Should we allocate more labels to some tasks and less to others? We show that: 1) If tasks are homogeneous, there is a uniform optimal allocation, whereby all tasks get the same amount of data; 2) At fixed budget, there is a trade-off between number of tasks and number of data points per task, with a unique solution for the optimum; 3) When trained separately, harder task should get more…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

How to Distribute Data across Tasks for Meta-Learning?· underline

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Machine Learning and Data Classification · Radiomics and Machine Learning in Medical Imaging