Frozen Overparameterization: A Double Descent Perspective on Transfer Learning of Deep Neural Networks
Yehuda Dar, Lorenzo Luzi, Richard G. Baraniuk

TL;DR
This paper investigates how overparameterization and the double descent phenomenon influence the generalization performance of transfer learning in deep neural networks, considering factors like dataset size, layer freezing, and task similarity.
Contribution
It provides a novel analysis of transfer learning through the lens of double descent and overparameterization, highlighting how these factors affect generalization and transfer success.
Findings
Double descent impacts transfer learning performance.
Larger source datasets slow target training.
Number of frozen layers influences under/overparameterization.
Abstract
We study the generalization behavior of transfer learning of deep neural networks (DNNs). We adopt the overparameterization perspective -- featuring interpolation of the training data (i.e., approximately zero train error) and the double descent phenomenon -- to explain the delicate effect of the transfer learning setting on generalization performance. We study how the generalization behavior of transfer learning is affected by the dataset size in the source and target tasks, the number of transferred layers that are kept frozen in the target DNN training, and the similarity between the source and target tasks. We show that the test error evolution during the target DNN training has a more significant double descent effect when the target training dataset is sufficiently large. In addition, a larger source training dataset can yield a slower target DNN training. Moreover, we demonstrate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Machine Learning and ELM
MethodsMulti-Head Attention · Attention Is All You Need · Bottleneck Residual Block · Max Pooling · Residual Block · Average Pooling · *Communicated@Fast*How Do I Communicate to Expedia? · Kaiming Initialization · 1x1 Convolution · Convolution
