Characterization of Transfer Using Multi-task Learning Curves
Andr\'as Millinghoffer, Bence Bolg\'ar, P\'eter Antal

TL;DR
This paper models transfer effects in multi-task learning using learning curves to better understand how adding data influences model performance, providing a new quantitative approach for transfer analysis.
Contribution
It introduces an efficient method to approximate multi-task learning curves, offering a more fundamental characterization of transfer effects compared to traditional gradient-based methods.
Findings
Learning curves effectively capture transfer effects in multi-task learning.
Multi-task extensions can distinguish pairwise and contextual transfer effects.
The proposed method is computationally efficient and broadly applicable.
Abstract
Transfer effects manifest themselves both during training using a fixed data set and in inductive inference using accumulating data. We hypothesize that perturbing the data set by including more samples, instead of perturbing the model by gradient updates, provides a complementary and more fundamental characterization of transfer effects. To capture this phenomenon, we quantitatively model transfer effects using multi-task learning curves approximating the inductive performance over varying sample sizes. We describe an efficient method to approximate multi-task learning curves analogous to the Task Affinity Grouping method applied during training. We compare the statistical and computational approaches to transfer, which indicates considerably higher compute costs for the previous but better power and broader applicability. Evaluations are performed using a benchmark drug-target…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Gaussian Processes and Bayesian Inference · Stochastic Gradient Optimization Techniques
