Fine-Tuning can Distort Pretrained Features and Underperform   Out-of-Distribution

Ananya Kumar; Aditi Raghunathan; Robbie Jones; Tengyu Ma; Percy Liang

arXiv:2202.10054·cs.LG·February 24, 2022·159 cites

Fine-Tuning can Distort Pretrained Features and Underperform Out-of-Distribution

Ananya Kumar, Aditi Raghunathan, Robbie Jones, Tengyu Ma, Percy Liang

PDF

Open Access 3 Repos 1 Video

TL;DR

Fine-tuning pretrained models can harm out-of-distribution accuracy due to feature distortion, but a two-step linear probing then fine-tuning approach mitigates this issue, improving overall performance.

Contribution

This paper reveals the OOD performance degradation caused by fine-tuning and proposes a simple two-step method that combines the strengths of linear probing and fine-tuning.

Findings

01

Fine-tuning reduces OOD accuracy compared to linear probing.

02

LP-FT outperforms both fine-tuning and linear probing on multiple datasets.

03

Theoretical analysis explains feature distortion during fine-tuning.

Abstract

When transferring a pretrained model to a downstream task, two popular methods are full fine-tuning (updating all the model parameters) and linear probing (updating only the last linear layer -- the "head"). It is well known that fine-tuning leads to better accuracy in-distribution (ID). However, in this paper, we find that fine-tuning can achieve worse accuracy than linear probing out-of-distribution (OOD) when the pretrained features are good and the distribution shift is large. On 10 distribution shift datasets (Breeds-Living17, Breeds-Entity30, DomainNet, CIFAR $\to$ STL, CIFAR10.1, FMoW, ImageNetV2, ImageNet-R, ImageNet-A, ImageNet-Sketch), fine-tuning obtains on average 2% higher accuracy ID but 7% lower accuracy OOD than linear probing. We show theoretically that this tradeoff between ID and OOD accuracy arises even in a simple setting: fine-tuning overparameterized two-layer…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Fine-Tuning can Distort Pretrained Features and Underperform Out-of-Distribution· slideslive

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Medical Imaging and Analysis

MethodsLinear Layer