Towards Scale Consistent Monocular Visual Odometry by Learning from the Virtual World
Sen Zhang, Jing Zhang, Dacheng Tao

TL;DR
This paper introduces VRVO, a framework that leverages virtual data and adversarial training to achieve scale-consistent monocular visual odometry without requiring stereo or ground-truth data in real-world scenarios.
Contribution
The paper proposes a novel virtual-to-real domain adaptation method for monocular VO that ensures scale consistency and integrates a mutual reinforcement pipeline for improved robustness.
Findings
Effective scale recovery using virtual data and adversarial training.
Improved long-term trajectory accuracy on KITTI datasets.
Robustness enhancement through bidirectional learning and optimization.
Abstract
Monocular visual odometry (VO) has attracted extensive research attention by providing real-time vehicle motion from cost-effective camera images. However, state-of-the-art optimization-based monocular VO methods suffer from the scale inconsistency problem for long-term predictions. Deep learning has recently been introduced to address this issue by leveraging stereo sequences or ground-truth motions in the training dataset. However, it comes at an additional cost for data collection, and such training data may not be available in all datasets. In this work, we propose VRVO, a novel framework for retrieving the absolute scale from virtual data that can be easily obtained from modern simulation environments, whereas in the real domain no stereo or ground-truth data are required in either the training or inference phases. Specifically, we first train a scale-aware disparity network using…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
