Conditionally Adaptive Multi-Task Learning: Improving Transfer Learning in NLP Using Fewer Parameters & Less Data
Jonathan Pilault, Amine Elhattami, Christopher Pal

TL;DR
This paper introduces a conditionally adaptive multi-task learning architecture for NLP that improves transfer learning efficiency by reducing parameters and data requirements while surpassing existing methods in performance.
Contribution
The paper proposes a novel Transformer-based architecture with task-conditioned modules and a new data sampling strategy to enhance multi-task learning in NLP.
Findings
Surpasses single-task fine-tuning methods in efficiency and performance.
Achieves state-of-the-art results on multiple NLP benchmarks.
Uses around 66% of data compared to traditional methods.
Abstract
Multi-Task Learning (MTL) networks have emerged as a promising method for transferring learned knowledge across different tasks. However, MTL must deal with challenges such as: overfitting to low resource tasks, catastrophic forgetting, and negative task transfer, or learning interference. Often, in Natural Language Processing (NLP), a separate model per task is needed to obtain the best performance. However, many fine-tuning approaches are both parameter inefficient, i.e., potentially involving one new model per task, and highly susceptible to losing knowledge acquired during pretraining. We propose a novel Transformer architecture consisting of a new conditional attention mechanism as well as a set of task-conditioned modules that facilitate weight sharing. Through this construction (a hypernetwork adapter), we achieve more efficient parameter sharing and mitigate forgetting by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Label Smoothing · Transformer · Adam · Softmax · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections
