Why Do Better Loss Functions Lead to Less Transferable Features?
Simon Kornblith, Ting Chen, Honglak Lee, Mohammad Norouzi

TL;DR
This study investigates how different loss functions influence the transferability of CNN features, revealing that better accuracy on the original task can reduce the usefulness of features for new tasks due to a trade-off between invariance and transferability.
Contribution
It demonstrates that loss functions improving in-task accuracy may impair feature transferability, highlighting a trade-off between optimizing for the original task and downstream applications.
Findings
Loss functions improve ImageNet accuracy but reduce transferability.
Differences among loss functions are mainly in the last few layers.
Higher class separation correlates with better original task accuracy but worse transfer to new tasks.
Abstract
Previous work has proposed many new loss functions and regularizers that improve test accuracy on image classification tasks. However, it is not clear whether these loss functions learn better representations for downstream tasks. This paper studies how the choice of training objective affects the transferability of the hidden representations of convolutional neural networks trained on ImageNet. We show that many objectives lead to statistically significant improvements in ImageNet accuracy over vanilla softmax cross-entropy, but the resulting fixed feature extractors transfer substantially worse to downstream tasks, and the choice of loss has little effect when networks are fully fine-tuned on the new tasks. Using centered kernel alignment to measure similarity between hidden representations of networks, we find that differences among loss functions are apparent only in the last few…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Adversarial Robustness in Machine Learning
MethodsLabel Smoothing · Dropout · Softmax
