Weighted Training for Cross-Task Learning
Shuxiao Chen, Koby Crammer, Hangfeng He, Dan Roth, Weijie J. Su

TL;DR
This paper presents TAWT, a simple, efficient weighted training algorithm for cross-task learning that minimizes task distance, with theoretical guarantees and successful application to NLP sequence tagging tasks.
Contribution
Introduction of TAWT, a novel weighted training method for cross-task learning with theoretical guarantees and practical effectiveness demonstrated on NLP tasks.
Findings
TAWT is easy to implement and computationally efficient.
TAWT achieves strong performance on multiple NLP sequence tagging tasks.
The representation-based task distance provides theoretical insights into cross-task learning.
Abstract
In this paper, we introduce Target-Aware Weighted Training (TAWT), a weighted training algorithm for cross-task learning based on minimizing a representation-based task distance between the source and target tasks. We show that TAWT is easy to implement, is computationally efficient, requires little hyperparameter tuning, and enjoys non-asymptotic learning-theoretic guarantees. The effectiveness of TAWT is corroborated through extensive experiments with BERT on four sequence tagging tasks in natural language processing (NLP), including part-of-speech (PoS) tagging, chunking, predicate detection, and named entity recognition (NER). As a byproduct, the proposed representation-based task distance allows one to reason in a theoretically principled way about several critical aspects of cross-task learning, such as the choice of the source data and the impact of fine-tuning.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Linear Layer · Linear Warmup With Linear Decay · Layer Normalization · Residual Connection · WordPiece · Dropout · Softmax
