Views Can Be Deceiving: Improved SSL Through Feature Space Augmentation
Kimia Hamidieh, Haoran Zhang, Swami Sankaranarayanan, Marzyeh Ghassemi

TL;DR
This paper investigates how spurious features affect self-supervised learning (SSL) in visual representations, revealing issues with common augmentations and proposing a pruning-based method to improve invariance and performance.
Contribution
It introduces LateTVG, a novel pruning-based regularization technique during SSL pre-training to remove spurious information and enhance invariant representations.
Findings
Common SSL augmentations can cause undesired invariances.
Dataset re-sampling does not reliably remove spurious features.
LateTVG improves representation quality and benchmark performance.
Abstract
Supervised learning methods have been found to exhibit inductive biases favoring simpler features. When such features are spuriously correlated with the label, this can result in suboptimal performance on minority subgroups. Despite the growing popularity of methods which learn from unlabeled data, the extent to which these representations rely on spurious features for prediction is unclear. In this work, we explore the impact of spurious features on Self-Supervised Learning (SSL) for visual representation learning. We first empirically show that commonly used augmentations in SSL can cause undesired invariances in the image space, and illustrate this with a simple example. We further show that classical approaches in combating spurious correlations, such as dataset re-sampling during SSL, do not consistently lead to invariant representations. Motivated by these findings, we propose…
Peer Reviews
Decision·ICLR 2024 spotlight
**1. Originality:** The paper introduces a novel method, "LATE TVG," to mitigate the influence of spurious correlations in Self-Supervised Learning (SSL) for visual representation learning. This approach is original in its utilization of later layer regularization via pruning to enhance the robustness of SSL models. Unlike conventional methods that often rely on re-sampling or require group or label information, LATE TVG innovatively ensures invariant representations without such dependencies. T
**1. Scalability and Efficiency:** The paper introduces the LATE TVG method, which involves regularizing later layers of the encoder via pruning. While this approach is novel, the scalability and computational efficiency of the method in large-scale settings are not thoroughly addressed. Pruning, especially in deeper layers, can be computationally intensive and may not scale well with very deep networks or extremely large datasets. **Actionable Insight:** Future work could focus on optimizing t
The strengths of the paper are as follows: Theoretical Insights: By analyzing simpler cases (Section 3.3), the paper provides theoretical arguments that offer a deeper understanding of how common augmentations used in Self Supervised Learning (SSL) pre-training affect the model's reliance on spurious features for downstream linear classifiers. Experimental evaluation of Spurious Feature Learning: The paper empirically explores the extent of spurious feature learning in self-supervised represe
The paper is missing references to some important works on spurious feature learning: 1. Salient ImageNet: How to discover spurious features in deep learning? 2. WILDS: A Benchmark of in-the-Wild Distribution Shifts Most of the results given in the paper are using smaller datasets. I believe the analysis of Section 4 could have been carried out a large number of publicly available SSL trained models. I would have liked to see some results using models trained on large datasets. To evaluate
1. The problem of spurious correlations has not been studied for SSL before and is of importance, considering the lack of group / label information in SSL can make this hard to remedy. 2. The conclusion that balancing the data may not be effective for remedying spurious correlations is extremely interesting. 3. The method proposed is effective in remedying worst-group accuracy despite other standard methods like group balancing failing.
1. The intuition / reasoning behind the choice to prune the later layers to forget the spurious feature is not well-explained / discussed sufficiently.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies · Data Quality and Management
