Transformer Guided Geometry Model for Flow-Based Unsupervised Visual Odometry
Xiangyu Li, Yonghong Hou, Pichao Wang, Zhimin Gao and, Mingliang Xu, Wanqing Li

TL;DR
This paper introduces a novel unsupervised visual odometry approach combining a Transformer-based local temporal model and a flow-based pairwise estimator, achieving high accuracy and efficiency.
Contribution
It proposes a dual estimator framework with a Transformer-guided geometry model and flow-based pose estimation, improving accuracy over existing methods.
Findings
Outperforms state-of-the-art unsupervised methods on KITTI and Malaga datasets.
Achieves comparable results to supervised and traditional methods.
Demonstrates robustness and efficiency in visual odometry tasks.
Abstract
Existing unsupervised visual odometry (VO) methods either match pairwise images or integrate the temporal information using recurrent neural networks over a long sequence of images. They are either not accurate, time-consuming in training or error accumulative. In this paper, we propose a method consisting of two camera pose estimators that deal with the information from pairwise images and a short sequence of images respectively. For image sequences, a Transformer-like structure is adopted to build a geometry model over a local temporal window, referred to as Transformer-based Auxiliary Pose Estimator (TAPE). Meanwhile, a Flow-to-Flow Pose Estimator (F2FPE) is proposed to exploit the relationship between pairwise images. The two estimators are constrained through a simple yet effective consistency loss in training. Empirical evaluation has shown that the proposed method outperforms the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Robotics and Sensor-Based Localization · Advanced Image and Video Retrieval Techniques
