Self-supervised visual learning from interactions with objects

Arthur Aubret; C\'eline Teuli\`ere; Jochen Triesch

arXiv:2407.06704·cs.CV·August 9, 2024

Self-supervised visual learning from interactions with objects

Arthur Aubret, C\'eline Teuli\`ere, Jochen Triesch

PDF

Open Access 1 Repo

TL;DR

This paper proposes a method to enhance self-supervised visual learning by incorporating object-related actions, leading to improved object category recognition through better viewpoint alignment.

Contribution

It introduces a novel loss function that aligns action and visual embeddings, leveraging embodied interactions to structure visual representations in SSL.

Findings

01

Outperforms previous SSL methods on category recognition tasks.

02

Improves viewpoint-wise alignment of objects within the same category.

03

Embodied actions contribute to more robust visual representations.

Abstract

Self-supervised learning (SSL) has revolutionized visual representation learning, but has not achieved the robustness of human vision. A reason for this could be that SSL does not leverage all the data available to humans during learning. When learning about an object, humans often purposefully turn or move around objects and research suggests that these interactions can substantially enhance their learning. Here we explore whether such object-related actions can boost SSL. For this, we extract the actions performed to change from one ego-centric view of an object to another in four video datasets. We then introduce a new loss function to learn visual and action embeddings by aligning the performed action with the representations of two images extracted from the same clip. This permits the performed actions to structure the latent visual representation. Our experiments show that our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

trieschlab/aassl
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Retrieval and Classification Techniques