Using Navigational Information to Learn Visual Representations

Lizhen Zhu; Brad Wyble; James Z. Wang

arXiv:2202.08114·cs.CV·February 17, 2022

Using Navigational Information to Learn Visual Representations

Lizhen Zhu, Brad Wyble, James Z. Wang

PDF

Open Access

TL;DR

This paper demonstrates that incorporating navigational spatial and temporal information into contrastive learning enhances the quality of visual representations, surpassing traditional instance discrimination methods in downstream classification tasks.

Contribution

It introduces a novel pretraining pipeline that leverages self-generated navigational data in a photorealistic environment to improve self-supervised visual learning.

Findings

01

Spatial and temporal info improves representation quality

02

Enhanced downstream classification performance

03

Contextual info is more effective than instance discrimination

Abstract

Children learn to build a visual representation of the world from unsupervised exploration and we hypothesize that a key part of this learning ability is the use of self-generated navigational information as a similarity label to drive a learning objective for self-supervised learning. The goal of this work is to exploit navigational information in a visual environment to provide performance in training that exceeds the state-of-the-art self-supervised training. Here, we show that using spatial and temporal information in the pretraining stage of contrastive learning can improve the performance of downstream classification relative to conventional contrastive learning approaches that use instance discrimination to discriminate between two alterations of the same image or two different images. We designed a pipeline to generate egocentric-vision images from a photorealistic ray-tracing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning

MethodsContrastive Learning