Equivariant Representation Learning for Augmentation-based   Self-Supervised Learning via Image Reconstruction

Qin Wang; Kai Krajsek; Hanno Scharr

arXiv:2412.03314·cs.CV·December 5, 2024

Equivariant Representation Learning for Augmentation-based Self-Supervised Learning via Image Reconstruction

Qin Wang, Kai Krajsek, Hanno Scharr

PDF

Open Access

TL;DR

This paper introduces a novel augmentation-based self-supervised learning method that incorporates image reconstruction to effectively learn both invariant and equivariant features, improving generalization across various datasets.

Contribution

It proposes a new auxiliary image reconstruction task with a cross-attention mechanism to enhance equivariant feature learning without extra parameters.

Findings

01

Significant improvements over standard methods on artificial and natural datasets.

02

Enhanced learning of both invariant and equivariant features.

03

Better performance in downstream tasks involving combined augmentations.

Abstract

Augmentation-based self-supervised learning methods have shown remarkable success in self-supervised visual representation learning, excelling in learning invariant features but often neglecting equivariant ones. This limitation reduces the generalizability of foundation models, particularly for downstream tasks requiring equivariance. We propose integrating an image reconstruction task as an auxiliary component in augmentation-based self-supervised learning algorithms to facilitate equivariant feature learning without additional parameters. Our method implements a cross-attention mechanism to blend features learned from two augmented views, subsequently reconstructing one of them. This approach is adaptable to various datasets and augmented-pair based learning methods. We evaluate its effectiveness on learning equivariant features through multiple linear regression tasks and downstream…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Face and Expression Recognition

MethodsLinear Regression