BERT and PALs: Projected Attention Layers for Efficient Adaptation in Multi-Task Learning
Asa Cooper Stickland, Iain Murray

TL;DR
This paper introduces projected attention layers (PALs), a novel multi-task learning method that enables sharing a single BERT model across tasks with minimal additional parameters, achieving comparable or superior performance to task-specific models.
Contribution
The paper proposes PALs, a new adaptation module for multi-task learning that significantly reduces parameters while maintaining high performance on NLP benchmarks.
Findings
PALs match the performance of separate fine-tuned BERT models on GLUE
PALs achieve state-of-the-art results on Recognizing Textual Entailment
Multi-task BERT with PALs uses roughly 7 times fewer parameters
Abstract
Multi-task learning shares information between related tasks, sometimes reducing the number of parameters required. State-of-the-art results across multiple natural language understanding tasks in the GLUE benchmark have previously used transfer from a single large task: unsupervised pre-training with BERT, where a separate BERT model was fine-tuned for each task. We explore multi-task approaches that share a single BERT model with a small number of additional task-specific parameters. Using new adaptation modules, PALs or `projected attention layers', we match the performance of separately fine-tuned models on the GLUE benchmark with roughly 7 times fewer parameters, and obtain state-of-the-art results on the Recognizing Textual Entailment dataset.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
MethodsLinear Layer · Residual Connection · Attention Dropout · Linear Warmup With Linear Decay · Weight Decay · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections · Adam · WordPiece · Softmax
