Explaining the Effectiveness of Multi-Task Learning for Efficient Knowledge Extraction from Spine MRI Reports
Arijit Sehanobish, McCullen Sandora, Nabila Abraham, Jayashri Pawar,, Danielle Torres, Anasuya Das, Murray Becker, Richard Herzog, Benjamin Odry,, Ron Vianu

TL;DR
This paper investigates why multi-task learning with transformers is effective, showing that aligned representations and gradients across tasks enable a single model to perform as well as task-specific models, validated on spine MRI report datasets.
Contribution
It demonstrates that aligned hidden representations and gradients across tasks explain multi-task learning effectiveness, validated on radiologist-annotated spine MRI datasets.
Findings
Single multi-task model matches task-specific models when representations are aligned.
Aligned gradients and representations across tasks are key to multi-task learning success.
Method is simple, intuitive, and applicable to various NLP problems.
Abstract
Pretrained Transformer based models finetuned on domain specific corpora have changed the landscape of NLP. However, training or fine-tuning these models for individual tasks can be time consuming and resource intensive. Thus, a lot of current research is focused on using transformers for multi-task learning (Raffel et al.,2020) and how to group the tasks to help a multi-task model to learn effective representations that can be shared across tasks (Standley et al., 2020; Fifty et al., 2021). In this work, we show that a single multi-tasking model can match the performance of task specific models when the task specific models show similar representations across all of their hidden layers and their gradients are aligned, i.e. their gradients follow the same direction. We hypothesize that the above observations explain the effectiveness of multi-task learning. We validate our observations…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMedical Imaging and Analysis · Machine Learning in Healthcare · Artificial Intelligence in Healthcare and Education
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Byte Pair Encoding · Absolute Position Encodings · Residual Connection · Layer Normalization · Position-Wise Feed-Forward Layer · Dense Connections · Dropout
