Self-Improving Visual Odometry
Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich

TL;DR
This paper introduces a self-supervised learning framework for visual odometry that improves over time by retraining on large-scale unlabeled monocular video data, outperforming traditional and deep learning methods.
Contribution
It presents a novel self-supervised approach enabling a VO frontend to learn and adapt over time without manual tuning or labeled data.
Findings
Outperforms traditional feature descriptors like SIFT, ORB, AKAZE.
Outperforms deep learning methods SuperPoint and LF-Net.
Automatically identifies less useful keypoints such as shadows and dynamic objects.
Abstract
We propose a self-supervised learning framework that uses unlabeled monocular video sequences to generate large-scale supervision for training a Visual Odometry (VO) frontend, a network which computes pointwise data associations across images. Our self-improving method enables a VO frontend to learn over time, unlike other VO and SLAM systems which require time-consuming hand-tuning or expensive data collection to adapt to new environments. Our proposed frontend operates on monocular images and consists of a single multi-task convolutional neural network which outputs 2D keypoints locations, keypoint descriptors, and a novel point stability score. We use the output of VO to create a self-supervised dataset of point correspondences to retrain the frontend. When trained using VO at scale on 2.5 million monocular images from ScanNet, the stability classifier automatically discovers a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Sensor-Based Localization · Advanced Vision and Imaging · Advanced Image and Video Retrieval Techniques
