Information-Theoretic Requirements for Gradient-Based Task Affinity Estimation in Multi-Task Learning
Jasper Zhang, Bryan Cheng

TL;DR
This paper reveals that gradient-based task affinity estimation in multi-task learning requires significant sample overlap to be meaningful, explaining past inconsistent results and providing a new theoretical framework.
Contribution
It identifies a critical sample overlap threshold for gradient analysis validity and explains previous inconsistencies in multi-task learning outcomes.
Findings
Gradient-task correlations are indistinguishable from noise below 30% overlap.
Above 40% overlap, correlations reliably reflect biological structure.
Standard benchmarks operate far below the overlap threshold, invalidating gradient analysis.
Abstract
Multi-task learning shows strikingly inconsistent results -- sometimes joint training helps substantially, sometimes it actively harms performance -- yet the field lacks a principled framework for predicting these outcomes. We identify a fundamental but unstated assumption underlying gradient-based task analysis: tasks must share training instances for gradient conflicts to reveal genuine relationships. When tasks are measured on the same inputs, gradient alignment reflects shared mechanistic structure; when measured on disjoint inputs, any apparent signal conflates task relationships with distributional shift. We discover this sample overlap requirement exhibits a sharp phase transition: below 30% overlap, gradient-task correlations are statistically indistinguishable from noise; above 40%, they reliably recover known biological structure. Comprehensive validation across multiple…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
