Scalable Multitask Learning Using Gradient-based Estimation of Task   Affinity

Dongyue Li; Aneesh Sharma; and Hongyang R. Zhang

arXiv:2409.06091·cs.LG·November 22, 2024

Scalable Multitask Learning Using Gradient-based Estimation of Task Affinity

Dongyue Li, Aneesh Sharma, and Hongyang R. Zhang

PDF

1 Repo

TL;DR

This paper introduces Grad-TAG, a gradient-based algorithm that efficiently estimates task affinities in multitask learning without extensive retraining, enabling scalable clustering of related tasks in large models.

Contribution

Grad-TAG provides a novel, efficient method for estimating task affinities using a linearization technique, reducing computational costs significantly compared to naive approaches.

Findings

01

Estimates task affinities within 2.7% of true values using only 3% of FLOPs.

02

Achieves accurate affinity estimation on large graphs with 21M edges and 500 tasks.

03

Outperforms existing methods in accuracy and runtime efficiency.

Abstract

Multitask learning is a widely used paradigm for training models on diverse tasks, with applications ranging from graph neural networks to language model fine-tuning. Since tasks may interfere with each other, a key notion for modeling their relationships is task affinity. This includes pairwise task affinity, computed among pairs of tasks, and higher-order affinity, computed among subsets of tasks. Naively computing either of them requires repeatedly training on data from various task combinations, which is computationally intensive. We present a new algorithm Grad-TAG that can estimate task affinities without this repeated training. The key idea of Grad-TAG is to train a "base" model for all tasks and then use a linearization technique to estimate the loss of the model for a specific task combination. The linearization works by computing a gradient-based approximation of the loss,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

virtuosoresearch/scalablemtl
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLogistic Regression