Information-Theoretic Requirements for Gradient-Based Task Affinity Estimation in Multi-Task Learning

Jasper Zhang; Bryan Cheng

arXiv:2604.07848·cs.LG·April 10, 2026

Information-Theoretic Requirements for Gradient-Based Task Affinity Estimation in Multi-Task Learning

Jasper Zhang, Bryan Cheng

PDF

TL;DR

This paper reveals that gradient-based task affinity estimation in multi-task learning requires significant sample overlap to be meaningful, explaining past inconsistent results and providing a new theoretical framework.

Contribution

It identifies a critical sample overlap threshold for gradient analysis validity and explains previous inconsistencies in multi-task learning outcomes.

Findings

01

Gradient-task correlations are indistinguishable from noise below 30% overlap.

02

Above 40% overlap, correlations reliably reflect biological structure.

03

Standard benchmarks operate far below the overlap threshold, invalidating gradient analysis.

Abstract

Multi-task learning shows strikingly inconsistent results -- sometimes joint training helps substantially, sometimes it actively harms performance -- yet the field lacks a principled framework for predicting these outcomes. We identify a fundamental but unstated assumption underlying gradient-based task analysis: tasks must share training instances for gradient conflicts to reveal genuine relationships. When tasks are measured on the same inputs, gradient alignment reflects shared mechanistic structure; when measured on disjoint inputs, any apparent signal conflates task relationships with distributional shift. We discover this sample overlap requirement exhibits a sharp phase transition: below 30% overlap, gradient-task correlations are statistically indistinguishable from noise; above 40%, they reliably recover known biological structure. Comprehensive validation across multiple…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.