Learning Online Visual Invariances for Novel Objects via Supervised and   Self-Supervised Training

Valerio Biscione; Jeffrey S. Bowers

arXiv:2110.01476·cs.CV·April 5, 2022

Learning Online Visual Invariances for Novel Objects via Supervised and Self-Supervised Training

Valerio Biscione, Jeffrey S. Bowers

PDF

Open Access

TL;DR

This paper demonstrates that standard CNNs can learn strong visual invariances to various transformations for novel objects with minimal training data, and that self-supervised learning can achieve similar results, resembling human learning.

Contribution

It shows that CNNs can acquire online invariances to transformations with limited data and that self-supervised methods can replicate this, aligning with human learning processes.

Findings

01

CNNs trained on synthetic objects develop invariances with as few as 50 objects.

02

Invariances extend to real-world object datasets.

03

Self-supervised training achieves similar invariance acquisition.

Abstract

Humans can identify objects following various spatial transformations such as scale and viewpoint. This extends to novel objects, after a single presentation at a single pose, sometimes referred to as online invariance. CNNs have been proposed as a compelling model of human vision, but their ability to identify objects across transformations is typically tested on held-out samples of trained categories after extensive data augmentation. This paper assesses whether standard CNNs can support human-like online invariance by training models to recognize images of synthetic 3D objects that undergo several transformations: rotation, scaling, translation, brightness, contrast, and viewpoint. Through the analysis of models' internal representations, we show that standard supervised CNNs trained on transformed objects can acquire strong invariances on novel classes even when trained with as few…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace Recognition and Perception · Generative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning