Using Navigational Information to Learn Visual Representations
Lizhen Zhu, Brad Wyble, James Z. Wang

TL;DR
This paper demonstrates that incorporating navigational spatial and temporal information into contrastive learning enhances the quality of visual representations, surpassing traditional instance discrimination methods in downstream classification tasks.
Contribution
It introduces a novel pretraining pipeline that leverages self-generated navigational data in a photorealistic environment to improve self-supervised visual learning.
Findings
Spatial and temporal info improves representation quality
Enhanced downstream classification performance
Contextual info is more effective than instance discrimination
Abstract
Children learn to build a visual representation of the world from unsupervised exploration and we hypothesize that a key part of this learning ability is the use of self-generated navigational information as a similarity label to drive a learning objective for self-supervised learning. The goal of this work is to exploit navigational information in a visual environment to provide performance in training that exceeds the state-of-the-art self-supervised training. Here, we show that using spatial and temporal information in the pretraining stage of contrastive learning can improve the performance of downstream classification relative to conventional contrastive learning approaches that use instance discrimination to discriminate between two alterations of the same image or two different images. We designed a pipeline to generate egocentric-vision images from a photorealistic ray-tracing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
MethodsContrastive Learning
