Augmentation-aware Self-supervised Learning with Conditioned Projector
Marcin Przewi\k{e}\'zlikowski, Mateusz Pyla, Bartosz Zieli\'nski,, Bart{\l}omiej Twardowski, Jacek Tabor, Marek \'Smieja

TL;DR
This paper introduces CASSLE, a self-supervised learning method that incorporates augmentation information into the projector to improve sensitivity to traits affected by augmentations, enhancing downstream task performance.
Contribution
It proposes a novel augmentation-aware projector that preserves augmentation info, enabling SSL models to better capture traits like color relevant for downstream tasks.
Findings
CASSLE improves SSL performance across multiple methods.
Enhanced sensitivity to augmentation traits like color.
Achieves state-of-the-art results on downstream tasks.
Abstract
Self-supervised learning (SSL) is a powerful technique for learning from unlabeled data. By learning to remain invariant to applied data augmentations, methods such as SimCLR and MoCo can reach quality on par with supervised approaches. However, this invariance may be detrimental for solving downstream tasks that depend on traits affected by augmentations used during pretraining, such as color. In this paper, we propose to foster sensitivity to such characteristics in the representation space by modifying the projector network, a common component of self-supervised architectures. Specifically, we supplement the projector with information about augmentations applied to images. For the projector to take advantage of this auxiliary conditioning when solving the SSL task, the feature extractor learns to preserve the augmentation information in its representations. Our approach, coined…
Peer Reviews
Decision·Submitted to ICLR 2024
- This paper is generally well-written. It is easy to understand. - The idea is simple, intuitive, and seems to be widely applicable. - The proposed method, CASSLE, outperforms baselines (LooC, AugSelf, and AI) that also learn augmentation-aware information.
**(1) Lack of comparison with recent augmentation-free SSL methods.** \ Recently, there have been proposed many augmentation-free self-supervised learning methods, including data2vec [1-2], I-JEPA [3], and Masked Image Modeling (MIM) [4-5]. The augmentation-free SSL methods do not use augmentation, in other words, they aim to learn full information about original images, rather than learning augmentation-invariant representations. Also, since they are often better than MoCo-v2 and SimCLR in vari
1. The identified problem is known and significant for representation learning. The authors discuss fairly well the related literature and approaches to its solution. 2. The idea is fairly novel, there have been some similar approaches that essentially “condition the projector network”. Please, refer to Question 1. 3. Nonetheless, their results generally convince that the detail is in the implementation level, rather than the conceptual. 4. The paper is well-written and well-argumented. Overall
1. Experiments remain relatively small-scale in dataset and model size. Especially, it would have been interesting to examine the effect of conditioning as pretraining data becomes abundant. 2. CASSLE performs better (compared to AugSelf) for contrastive methods and BarlowTwins than others, i.e. BYOL and SimSiam. A discussion on why this happens can be interesting. 3. Semi-supervised (few-shot classification) results are competitive, but weaker. 4. Experiments on object detection task demonstrat
* The manuscript is well written and experiments are well picked to test the purported claims regarding sensitivity of learned features to augmentations applied during training. * CASSLE is simple and has demonstrated efficacy when training augmentation-based contrastive models. When compared to other methods that condition on augmentations applied during training, table 1 shows that CASSLE has superior performance across many datasets.
* Based on Table 7, the proposed method seems to less effective for SimSiam and BYOL compared to InfoNCE based methods. The manuscript currently claims that CASSLE is applicable to all joint-embedding architectures, but the current experimental results do not demonstrate this. * The experiments in 4.2 use the InfoNCE to evaluate augmentation-awareness, which is sensitive to the negative examples that are used. Instead of this, why not perform linear probing to predict the specific augmentation a
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Mycobacterium research and diagnosis · Cancer-related molecular mechanisms research
MethodsBitcoin Customer Service Number +1-833-534-1729 · Average Pooling · 1x1 Convolution · Residual Connection · Convolution · Global Average Pooling · Dense Connections · Batch Normalization · Bottleneck Residual Block · InfoNCE
