Dense Prediction Transformer for Scale Estimation in Monocular Visual   Odometry

Andr\'e O. Fran\c{c}ani; Marcos R. O. A. Maximo

arXiv:2210.01723·cs.CV·January 9, 2023·1 cites

Dense Prediction Transformer for Scale Estimation in Monocular Visual Odometry

Andr\'e O. Fran\c{c}ani, Marcos R. O. A. Maximo

PDF

Open Access 1 Repo

TL;DR

This paper introduces a dense prediction transformer model to improve scale estimation in monocular visual odometry, effectively reducing scale drift and achieving state-of-the-art performance in depth estimation tasks.

Contribution

It presents a novel application of dense prediction transformers for accurate scale estimation in monocular visual odometry systems.

Findings

01

Reduced scale drift in monocular visual odometry

02

Achieved competitive state-of-the-art depth estimation performance

03

Demonstrated effectiveness on a visual odometry benchmark

Abstract

Monocular visual odometry consists of the estimation of the position of an agent through images of a single camera, and it is applied in autonomous vehicles, medical robots, and augmented reality. However, monocular systems suffer from the scale ambiguity problem due to the lack of depth information in 2D frames. This paper contributes by showing an application of the dense prediction transformer model for scale estimation in monocular visual odometry systems. Experimental results show that the scale drift problem of monocular systems can be reduced through the accurate estimation of the depth map by this model, achieving competitive state-of-the-art performance on a visual odometry benchmark.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

aofrancani/dpt-vo
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Advanced Image and Video Retrieval Techniques · Image Processing Techniques and Applications