# To Tune or Not to Tune? Adapting Pretrained Representations to Diverse   Tasks

**Authors:** Matthew E. Peters, Sebastian Ruder, Noah A. Smith

arXiv: 1903.05987 · 2019-06-12

## TL;DR

This paper investigates whether to fine-tune or use feature extraction when adapting pretrained NLP models to new tasks, showing that the best approach depends on task similarity.

## Contribution

It provides empirical insights and guidelines on choosing between fine-tuning and feature extraction based on task similarity for NLP transfer learning.

## Key findings

- Fine-tuning outperforms feature extraction on similar tasks.
- Feature extraction is preferable for dissimilar tasks.
- Guidelines help practitioners select adaptation methods.

## Abstract

While most previous work has focused on different pretraining objectives and architectures for transfer learning, we ask how to best adapt the pretrained model to a given target task. We focus on the two most common forms of adaptation, feature extraction (where the pretrained weights are frozen), and directly fine-tuning the pretrained model. Our empirical results across diverse NLP tasks with two state-of-the-art models show that the relative performance of fine-tuning vs. feature extraction depends on the similarity of the pretraining and target tasks. We explore possible explanations for this finding and provide a set of adaptation guidelines for the NLP practitioner.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1903.05987/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/1903.05987/full.md

## References

40 references — full list in the complete paper: https://tomesphere.com/paper/1903.05987/full.md

---
Source: https://tomesphere.com/paper/1903.05987