Weighted Training for Cross-Task Learning

Shuxiao Chen; Koby Crammer; Hangfeng He; Dan Roth; Weijie J. Su

arXiv:2105.14095·cs.LG·March 2, 2022·6 cites

Weighted Training for Cross-Task Learning

Shuxiao Chen, Koby Crammer, Hangfeng He, Dan Roth, Weijie J. Su

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper presents TAWT, a simple, efficient weighted training algorithm for cross-task learning that minimizes task distance, with theoretical guarantees and successful application to NLP sequence tagging tasks.

Contribution

Introduction of TAWT, a novel weighted training method for cross-task learning with theoretical guarantees and practical effectiveness demonstrated on NLP tasks.

Findings

01

TAWT is easy to implement and computationally efficient.

02

TAWT achieves strong performance on multiple NLP sequence tagging tasks.

03

The representation-based task distance provides theoretical insights into cross-task learning.

Abstract

In this paper, we introduce Target-Aware Weighted Training (TAWT), a weighted training algorithm for cross-task learning based on minimizing a representation-based task distance between the source and target tasks. We show that TAWT is easy to implement, is computationally efficient, requires little hyperparameter tuning, and enjoys non-asymptotic learning-theoretic guarantees. The effectiveness of TAWT is corroborated through extensive experiments with BERT on four sequence tagging tasks in natural language processing (NLP), including part-of-speech (PoS) tagging, chunking, predicate detection, and named entity recognition (NER). As a byproduct, the proposed representation-based task distance allows one to reason in a theoretically principled way about several critical aspects of cross-task learning, such as the choice of the source data and the impact of fine-tuning.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

HornHehhf/TAWT
pytorchOfficial

Videos

Weighted Training for Cross-Task Learning· slideslive

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Linear Layer · Linear Warmup With Linear Decay · Layer Normalization · Residual Connection · WordPiece · Dropout · Softmax