Transformer Guided Geometry Model for Flow-Based Unsupervised Visual   Odometry

Xiangyu Li; Yonghong Hou; Pichao Wang; Zhimin Gao and; Mingliang Xu; Wanqing Li

arXiv:2101.02143·cs.CV·January 7, 2021

Transformer Guided Geometry Model for Flow-Based Unsupervised Visual Odometry

Xiangyu Li, Yonghong Hou, Pichao Wang, Zhimin Gao and, Mingliang Xu, Wanqing Li

PDF

Open Access

TL;DR

This paper introduces a novel unsupervised visual odometry approach combining a Transformer-based local temporal model and a flow-based pairwise estimator, achieving high accuracy and efficiency.

Contribution

It proposes a dual estimator framework with a Transformer-guided geometry model and flow-based pose estimation, improving accuracy over existing methods.

Findings

01

Outperforms state-of-the-art unsupervised methods on KITTI and Malaga datasets.

02

Achieves comparable results to supervised and traditional methods.

03

Demonstrates robustness and efficiency in visual odometry tasks.

Abstract

Existing unsupervised visual odometry (VO) methods either match pairwise images or integrate the temporal information using recurrent neural networks over a long sequence of images. They are either not accurate, time-consuming in training or error accumulative. In this paper, we propose a method consisting of two camera pose estimators that deal with the information from pairwise images and a short sequence of images respectively. For image sequences, a Transformer-like structure is adopted to build a geometry model over a local temporal window, referred to as Transformer-based Auxiliary Pose Estimator (TAPE). Meanwhile, a Flow-to-Flow Pose Estimator (F2FPE) is proposed to exploit the relationship between pairwise images. The two estimators are constrained through a simple yet effective consistency loss in training. Empirical evaluation has shown that the proposed method outperforms the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Robotics and Sensor-Based Localization · Advanced Image and Video Retrieval Techniques