Conditionally Adaptive Multi-Task Learning: Improving Transfer Learning   in NLP Using Fewer Parameters & Less Data

Jonathan Pilault; Amine Elhattami; Christopher Pal

arXiv:2009.09139·cs.LG·April 22, 2022·1 cites

Conditionally Adaptive Multi-Task Learning: Improving Transfer Learning in NLP Using Fewer Parameters & Less Data

Jonathan Pilault, Amine Elhattami, Christopher Pal

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a conditionally adaptive multi-task learning architecture for NLP that improves transfer learning efficiency by reducing parameters and data requirements while surpassing existing methods in performance.

Contribution

The paper proposes a novel Transformer-based architecture with task-conditioned modules and a new data sampling strategy to enhance multi-task learning in NLP.

Findings

01

Surpasses single-task fine-tuning methods in efficiency and performance.

02

Achieves state-of-the-art results on multiple NLP benchmarks.

03

Uses around 66% of data compared to traditional methods.

Abstract

Multi-Task Learning (MTL) networks have emerged as a promising method for transferring learned knowledge across different tasks. However, MTL must deal with challenges such as: overfitting to low resource tasks, catastrophic forgetting, and negative task transfer, or learning interference. Often, in Natural Language Processing (NLP), a separate model per task is needed to obtain the best performance. However, many fine-tuning approaches are both parameter inefficient, i.e., potentially involving one new model per task, and highly susceptible to losing knowledge acquired during pretraining. We propose a novel Transformer architecture consisting of a new conditional attention mechanism as well as a set of task-conditioned modules that facilitate weight sharing. Through this construction (a hypernetwork adapter), we achieve more efficient parameter sharing and mitigate forgetting by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

CAMTL/CA-MTL
pytorchOfficial

Videos

Conditionally Adaptive Multi-Task Learning: Improving Transfer Learning in NLP Using Fewer Parameters & Less Data· slideslive

Taxonomy

TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Label Smoothing · Transformer · Adam · Softmax · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections