Augmentation-aware Self-supervised Learning with Conditioned Projector

Marcin Przewi\k{e}\'zlikowski; Mateusz Pyla; Bartosz Zieli\'nski,; Bart{\l}omiej Twardowski; Jacek Tabor; Marek \'Smieja

arXiv:2306.06082·cs.CV·October 22, 2024·1 cites

Augmentation-aware Self-supervised Learning with Conditioned Projector

Marcin Przewi\k{e}\'zlikowski, Mateusz Pyla, Bartosz Zieli\'nski,, Bart{\l}omiej Twardowski, Jacek Tabor, Marek \'Smieja

PDF

Open Access 1 Repo 3 Reviews

TL;DR

This paper introduces CASSLE, a self-supervised learning method that incorporates augmentation information into the projector to improve sensitivity to traits affected by augmentations, enhancing downstream task performance.

Contribution

It proposes a novel augmentation-aware projector that preserves augmentation info, enabling SSL models to better capture traits like color relevant for downstream tasks.

Findings

01

CASSLE improves SSL performance across multiple methods.

02

Enhanced sensitivity to augmentation traits like color.

03

Achieves state-of-the-art results on downstream tasks.

Abstract

Self-supervised learning (SSL) is a powerful technique for learning from unlabeled data. By learning to remain invariant to applied data augmentations, methods such as SimCLR and MoCo can reach quality on par with supervised approaches. However, this invariance may be detrimental for solving downstream tasks that depend on traits affected by augmentations used during pretraining, such as color. In this paper, we propose to foster sensitivity to such characteristics in the representation space by modifying the projector network, a common component of self-supervised architectures. Specifically, we supplement the projector with information about augmentations applied to images. For the projector to take advantage of this auxiliary conditioning when solving the SSL task, the feature extractor learns to preserve the augmentation information in its representations. Our approach, coined…

Peer Reviews

Decision·Submitted to ICLR 2024

Reviewer 01Rating 3· reject, not good enoughConfidence 5

Strengths

- This paper is generally well-written. It is easy to understand. - The idea is simple, intuitive, and seems to be widely applicable. - The proposed method, CASSLE, outperforms baselines (LooC, AugSelf, and AI) that also learn augmentation-aware information.

Weaknesses

**(1) Lack of comparison with recent augmentation-free SSL methods.** \ Recently, there have been proposed many augmentation-free self-supervised learning methods, including data2vec [1-2], I-JEPA [3], and Masked Image Modeling (MIM) [4-5]. The augmentation-free SSL methods do not use augmentation, in other words, they aim to learn full information about original images, rather than learning augmentation-invariant representations. Also, since they are often better than MoCo-v2 and SimCLR in vari

Reviewer 02Rating 6· marginally above the acceptance thresholdConfidence 4

Strengths

1. The identified problem is known and significant for representation learning. The authors discuss fairly well the related literature and approaches to its solution. 2. The idea is fairly novel, there have been some similar approaches that essentially “condition the projector network”. Please, refer to Question 1. 3. Nonetheless, their results generally convince that the detail is in the implementation level, rather than the conceptual. 4. The paper is well-written and well-argumented. Overall

Weaknesses

1. Experiments remain relatively small-scale in dataset and model size. Especially, it would have been interesting to examine the effect of conditioning as pretraining data becomes abundant. 2. CASSLE performs better (compared to AugSelf) for contrastive methods and BarlowTwins than others, i.e. BYOL and SimSiam. A discussion on why this happens can be interesting. 3. Semi-supervised (few-shot classification) results are competitive, but weaker. 4. Experiments on object detection task demonstrat

Reviewer 03Rating 6· marginally above the acceptance thresholdConfidence 4

Strengths

* The manuscript is well written and experiments are well picked to test the purported claims regarding sensitivity of learned features to augmentations applied during training. * CASSLE is simple and has demonstrated efficacy when training augmentation-based contrastive models. When compared to other methods that condition on augmentations applied during training, table 1 shows that CASSLE has superior performance across many datasets.

Weaknesses

* Based on Table 7, the proposed method seems to less effective for SimSiam and BYOL compared to InfoNCE based methods. The manuscript currently claims that CASSLE is applicable to all joint-embedding architectures, but the current experimental results do not demonstrate this. * The experiments in 4.2 use the InfoNCE to evaluate augmentation-awareness, which is sensitive to the negative examples that are used. Instead of this, why not perform linear probing to predict the specific augmentation a

Code & Models

Repositories

gmum/cassle
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Mycobacterium research and diagnosis · Cancer-related molecular mechanisms research

MethodsBitcoin Customer Service Number +1-833-534-1729 · Average Pooling · 1x1 Convolution · Residual Connection · Convolution · Global Average Pooling · Dense Connections · Batch Normalization · Bottleneck Residual Block · InfoNCE