LoCUS: Learning Multiscale 3D-consistent Features from Posed Images
Dominik A. Kloepfer, Dylan Campbell, Jo\~ao F. Henriques

TL;DR
LoCUS introduces a self-supervised method for learning multiscale, 3D-consistent features from posed images, enabling robust spatial mapping and localization without supervision.
Contribution
It proposes a retrieval-based training objective that balances feature reusability and specificity, allowing multiscale spatial feature learning from images.
Findings
Effective multiscale spatial maps with identifiable landmarks
Improved landmark retrieval and localization accuracy
Versatile application in segmentation tasks
Abstract
An important challenge for autonomous agents such as robots is to maintain a spatially and temporally consistent model of the world. It must be maintained through occlusions, previously-unseen views, and long time horizons (e.g., loop closure and re-identification). It is still an open question how to train such a versatile neural representation without supervision. We start from the idea that the training objective can be framed as a patch retrieval problem: given an image patch in one view of a scene, we would like to retrieve (with high precision and recall) all patches in other views that map to the same real-world location. One drawback is that this objective does not promote reusability of features: by being unique to a scene (achieving perfect precision/recall), a representation will not be useful in the context of other scenes. We find that it is possible to balance retrieval…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Robotics and Sensor-Based Localization · Domain Adaptation and Few-Shot Learning
