Optimal transfer protocol by incremental layer defrosting
Federica Gerace, Diego Doimo, Stefano Sarao Mannelli, Luca Saglietti,, Alessandro Laio

TL;DR
This paper investigates optimal transfer learning protocols, demonstrating that selectively unfreezing layers based on data availability and task similarity can significantly improve model performance.
Contribution
It introduces a controlled framework to identify the optimal transfer depth, challenging the standard protocol of freezing feature extractor layers.
Findings
Optimal transfer depth depends on data amount and task similarity.
Selective layer unfreezing yields better transfer performance.
Internal representation analysis explains transfer optimality.
Abstract
Transfer learning is a powerful tool enabling model training with limited amounts of data. This technique is particularly useful in real-world problems where data availability is often a serious limitation. The simplest transfer learning protocol is based on ``freezing" the feature-extractor layers of a network pre-trained on a data-rich source task, and then adapting only the last layers to a data-poor target task. This workflow is based on the assumption that the feature maps of the pre-trained model are qualitatively similar to the ones that would have been learned with enough data on the target task. In this work, we show that this protocol is often sub-optimal, and the largest performance gain may be achieved when smaller portions of the pre-trained network are kept frozen. In particular, we make use of a controlled framework to identify the optimal transfer depth, which turns out…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Machine Learning and ELM · Speech Recognition and Synthesis
