Time to augment self-supervised visual representation learning
Arthur Aubret, Markus Ernst, C\'eline Teuli\`ere, Jochen Triesch

TL;DR
This paper demonstrates that incorporating time-based augmentations during natural interactions significantly enhances self-supervised visual representation learning, making artificial systems more akin to biological vision.
Contribution
It systematically investigates the benefits of time-based augmentations, revealing their superiority over traditional image augmentations in learning object categories.
Findings
3-D object manipulations improve category learning
Viewing objects against changing backgrounds helps discard background info
Time-based augmentations outperform standard image augmentations
Abstract
Biological vision systems are unparalleled in their ability to learn visual representations without supervision. In machine learning, self-supervised learning (SSL) has led to major advances in forming object representations in an unsupervised fashion. Such systems learn representations invariant to augmentation operations over images, like cropping or flipping. In contrast, biological vision systems exploit the temporal structure of the visual experience during natural interactions with objects. This gives access to "augmentations" not commonly used in SSL, like watching the same object from multiple viewpoints or against different backgrounds. Here, we systematically investigate and compare the potential benefits of such time-based augmentations during natural interactions for learning object categories. Our results show that time-based augmentations achieve large performance gains…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsImage Processing Techniques and Applications · Domain Adaptation and Few-Shot Learning · Cell Image Analysis Techniques
MethodsContrastive Learning
