The (In)Effectiveness of Intermediate Task Training For Domain   Adaptation and Cross-Lingual Transfer Learning

Sovesh Mohapatra; Somesh Mohapatra

arXiv:2210.01091·cs.CL·November 8, 2022

The (In)Effectiveness of Intermediate Task Training For Domain Adaptation and Cross-Lingual Transfer Learning

Sovesh Mohapatra, Somesh Mohapatra

PDF

Open Access

TL;DR

This paper investigates the effectiveness of intermediate task training in transfer learning for NLP, finding that direct fine-tuning often outperforms intermediate training except for more generalized tasks, providing guidance for NLP practitioners.

Contribution

It provides a comprehensive analysis of when intermediate task training helps or hinders transfer learning across multiple NLP tasks and models.

Findings

01

Fine-tuning without intermediate training often yields better performance.

02

Intermediate training benefits more generalized tasks.

03

Results vary depending on task specificity and model used.

Abstract

Transfer learning from large language models (LLMs) has emerged as a powerful technique to enable knowledge-based fine-tuning for a number of tasks, adaptation of models for different domains and even languages. However, it remains an open question, if and when transfer learning will work, i.e. leading to positive or negative transfer. In this paper, we analyze the knowledge transfer across three natural language processing (NLP) tasks - text classification, sentimental analysis, and sentence similarity, using three LLMs - BERT, RoBERTa, and XLNet - and analyzing their performance, by fine-tuning on target datasets for domain and cross-lingual adaptation tasks, with and without an intermediate task training on a larger dataset. Our experiments showed that fine-tuning without an intermediate task training can lead to a better performance for most tasks, while more generalized tasks might…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Domain Adaptation and Few-Shot Learning

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Weight Decay · Attention Dropout · WordPiece · Dropout · Layer Normalization · Softmax · BERT