Identifying Task Groupings for Multi-Task Learning Using Pointwise V-Usable Information
Yingya Li, Timothy Miller, Steven Bethard, Guergana Savova

TL;DR
This paper introduces a new metric based on pointwise V-usable information (PVI) to identify optimal task groupings for multi-task learning, improving performance and efficiency across diverse NLP domains.
Contribution
It proposes a novel task relatedness metric using PVI and demonstrates its effectiveness for task grouping in multi-task learning across multiple NLP datasets.
Findings
Grouping tasks with similar PVI estimates improves multi-task learning performance.
The PVI-based method achieves competitive results with fewer parameters.
The approach is effective across general, biomedical, and clinical NLP domains.
Abstract
The success of multi-task learning can depend heavily on which tasks are grouped together. Naively grouping all tasks or a random set of tasks can result in negative transfer, with the multi-task models performing worse than single-task models. Though many efforts have been made to identify task groupings and to measure the relatedness among different tasks, it remains a challenging research topic to define a metric to identify the best task grouping out of a pool of many potential task combinations. We propose a metric of task relatedness based on task difficulty measured by pointwise V-usable information (PVI). PVI is a recently proposed metric to estimate how much usable information a dataset contains given a model. We hypothesize that tasks with not statistically different PVI estimates are similar enough to benefit from the joint learning process. We conduct comprehensive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIntelligent Tutoring Systems and Adaptive Learning
MethodsAttention Is All You Need · Adam · Dropout · Dense Connections · Layer Normalization · Residual Connection · Position-Wise Feed-Forward Layer · Linear Layer · Byte Pair Encoding · Absolute Position Encodings
