Investigating the Benefits of Projection Head for Representation Learning
Yihao Xue, Eric Gan, Jiayi Ni, Siddharth Joshi, Baharan Mirzasoleiman

TL;DR
This paper provides a theoretical understanding of why projection heads improve representation learning, revealing layer-wise feature weighting and the benefits of non-linearity, supported by empirical validation on multiple datasets.
Contribution
It offers a rigorous theoretical explanation for the benefits of projection heads, highlighting the role of implicit bias and layer-wise feature weighting in contrastive learning.
Findings
Layer-wise progressive feature weighting occurs during training.
Lower layers tend to have more normalized, less specialized features.
Introducing non-linearity enables lower layers to learn unique features.
Abstract
An effective technique for obtaining high-quality representations is adding a projection head on top of the encoder during training, then discarding it and using the pre-projection representations. Despite its proven practical effectiveness, the reason behind the success of this technique is poorly understood. The pre-projection representations are not directly optimized by the loss function, raising the question: what makes them better? In this work, we provide a rigorous theoretical answer to this question. We start by examining linear models trained with self-supervised contrastive loss. We reveal that the implicit bias of training algorithms leads to layer-wise progressive feature weighting, where features become increasingly unequal as we go deeper into the layers. Consequently, lower layers tend to have more normalized and less specialized representations. We theoretically…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEducational Games and Gamification · Intelligent Tutoring Systems and Adaptive Learning · Augmented Reality Applications
MethodsContrastive Learning
