TartanVO: A Generalizable Learning-based VO

Wenshan Wang; Yaoyu Hu; Sebastian Scherer

arXiv:2011.00359·cs.CV·November 3, 2020·6 cites

TartanVO: A Generalizable Learning-based VO

Wenshan Wang, Yaoyu Hu, Sebastian Scherer

PDF

Open Access 2 Repos

TL;DR

TartanVO introduces a learning-based visual odometry model that generalizes effectively across multiple datasets and real-world scenarios, outperforming traditional geometry-based methods especially in challenging environments.

Contribution

It is the first learning-based VO model that generalizes across datasets by leveraging synthetic data, an up-to-scale loss, and camera intrinsic parameters.

Findings

01

Single synthetic-trained model generalizes to real datasets

02

Outperforms geometry-based methods in challenging scenes

03

Effective without fine-tuning on real data

Abstract

We present the first learning-based visual odometry (VO) model, which generalizes to multiple datasets and real-world scenarios and outperforms geometry-based methods in challenging scenes. We achieve this by leveraging the SLAM dataset TartanAir, which provides a large amount of diverse synthetic data in challenging environments. Furthermore, to make our VO model generalize across datasets, we propose an up-to-scale loss function and incorporate the camera intrinsic parameters into the model. Experiments show that a single model, TartanVO, trained only on synthetic data, without any finetuning, can be generalized to real-world datasets such as KITTI and EuRoC, demonstrating significant advantages over the geometry-based methods on challenging trajectories. Our code is available at https://github.com/castacks/tartanvo.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobotics and Sensor-Based Localization · Advanced Vision and Imaging · Advanced Image and Video Retrieval Techniques